-
Couldn't load subscription status.
- Fork 11
Improve binary search performance time for diff creation #3
Conversation
When sections of old_data and new_data are found to be equal in the course of searching for matches, short circuit at that point. Signed-off-by: Patrick McCarty <[email protected]>
|
This change will need a lot more testing, but it fixes the performance issue #2. Feedback welcome! |
| } else { | ||
| } else if (result > 0) { | ||
| return search(I, old, oldsize, new, newsize, st, x, pos); | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this else is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, yes, but this is a style preference. I think having the else block keeps the code in a more maintainable state, since it is linked (conceptually) to the memcmp result.
| /* As a special case, short circuit for the first exact match | ||
| * between old_data and new_data, since future exact matches | ||
| * will have shorter length. */ | ||
| *pos = I[en]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before assign a value to pos , you should validate that it is not NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function that populates the I array will never add NULL values because the values are offsets into old_data (so, greater than or equal to 0).
|
@phmccarty do you have metrics/numbers to see how much this change improve the performance? |
|
+1 to the code changes, but I think we now need to add some more heuristics and testing to ensure that the algorithm is not too greedy. |
|
Looks good to me in general, but as @tmarcu mentions could be too greedy. Perhaps for the new unconditional condition added could instead of simply }else{ instead be conditioned on absolute size of length since you now have that in a variable? And maybe start with something big (500k?), since it's the bigs where the issue's been noticed? |
|
I started looking into this issue again recently, and I want to take a much different approach to solving it. I'll open a new PR for the new proposed changes. |
|
@phmccarty Can you share the link of other PR you proposed above to open to solve it with the different approach? |
When sections of old_data and new_data are found to be equal in the
course of searching for matches, short circuit at that point.