Skip to content

Conversation

rahul-privado
Copy link
Contributor

@rahul-privado rahul-privado commented Jan 20, 2023

Description

The list given to deduplicateTableEntries() becomes 30,000 long for snap ( 375 MB CPG ) in many cases and the function invocation then takes 40 milliseconds. 80% of the time is spend in .groupBy() and rest in the .map(). Most of the computation here is repeated since most of the elements are from the old list.

Cached the repeated part in 2 maps to avoid repetition.

Testing done

Verified for 6 different CPGs that the flow counts match with master.
While the flow counts including source and sinks are the same, some of the intermediate calls in the flows are different. This change results in some of the detours not being taken. However, the path is still the same from source to sink.

The difference is due to the union between the earlier stored and he new group sets when there are common keys. The merged group has same key and value hash code as the old group. However, it is not equal to what it would have been had it been fully recomputed.

Performance impact

The held task computation time for OpenCRX comes down from 35 seconds to 8 seconds with this change.

rahul-privado and others added 30 commits January 18, 2023 17:52
2. Removed caching for merge list since the new table entries would need a recomputation
2. Used parallelization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant