Optimize deduplication while processing held tasks #2203

rahul-privado · 2023-01-20T13:55:45Z

Description

The list given to deduplicateTableEntries() becomes 30,000 long for snap ( 375 MB CPG ) in many cases and the function invocation then takes 40 milliseconds. 80% of the time is spend in .groupBy() and rest in the .map(). Most of the computation here is repeated since most of the elements are from the old list.

Cached the repeated part in 2 maps to avoid repetition.

Testing done

Verified for 6 different CPGs that the flow counts match with master.
While the flow counts including source and sinks are the same, some of the intermediate calls in the flows are different. This change results in some of the detours not being taken. However, the path is still the same from source to sink.

The difference is due to the union between the earlier stored and he new group sets when there are common keys. The merged group has same key and value hash code as the old group. However, it is not equal to what it would have been had it been fully recomputed.

Performance impact

The held task computation time for OpenCRX comes down from 35 seconds to 8 seconds with this change.

…on time to 15% of earlier ( 6X speedup )

This reverts commit e6b18a8.

2. Removed caching for merge list since the new table entries would need a recomputation

… find the max elements

2. Used parallelization

rahul-privado and others added 30 commits January 18, 2023 17:52

Stored groups to avoid recomputation. Brings down held task computati…

e62f85b

…on time to 15% of earlier ( 6X speedup )

Code betterment

31f968e

Removed unused function

ecc40e8

Merge branch 'master' into held-tasks-opti

9e4042c

More compact code for old

e6b18a8

Revert "More compact code for old"

a7b9aea

This reverts commit e6b18a8.

Cache table entry list

0269925

Merge branch 'master' into held-tasks-opti

5378828

Formatted the code

e9b24da

Merge branch 'joernio:master' into held-tasks-opti

1457a60

1. Properly concatenated the lists to avoid missing TableEntries

42daee6

2. Removed caching for merge list since the new table entries would need a recomputation

Removed sorting O(n^2) worst case and made it a linear search O(n) to…

214edfc

… find the max elements

Unique hash for each TableEntry

a252feb

Lazy computation of Sha1 hash

22f0cb1

Merge branch 'master' into held-tasks-opti

bbd65f0

Reintroduced caching of table entry hashes

9dfadd7

Variable renaming

16323fd

Formatted the code

34372f4

Moved hash computation to a different function

04cc73d

Merging code improvement

f1ef699

Filter on max length

e5afde1

Filtering while appendin new list

ee78f6b

Accounted for new max entries

9ebba8f

Removed SHA/MD5 usage since it is expensive

83dc73f

Used minBy() instead of sorting

d4eba26

Updated group list map that was missed earlier

b56b25f

1. Converted merge list to hash map

a866227

2. Used parallelization

Better sync of the critical section

125e30b

Merge branch 'master' into held-tasks-opti

901c50e

Removed merge list map since it is no longer needed

f614366

rahul-privado and others added 9 commits February 1, 2023 17:59

Introduced RW lock for proper protection

328c287

Parallization in the outer loop

8375599

Removed outer loop parallelism due to race issues

c4ce927

Parallel group computation

febe497

Line formatting

39db644

Made priority computation more accurate

5ed522d

Merge branch 'master' into held-tasks-opti

dba8d1d

Merge branch 'joernio:master' into held-tasks-opti

70a53b8

Merge branch 'joernio:master' into held-tasks-opti

3a01ade

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize deduplication while processing held tasks #2203

Optimize deduplication while processing held tasks #2203

Uh oh!

rahul-privado commented Jan 20, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimize deduplication while processing held tasks #2203

Are you sure you want to change the base?

Optimize deduplication while processing held tasks #2203

Uh oh!

Conversation

rahul-privado commented Jan 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing done

Performance impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rahul-privado commented Jan 20, 2023 •

edited

Loading