-
-
Notifications
You must be signed in to change notification settings - Fork 26.4k
FIX sort radius neighbors results when sort_results=True and algorithm="brute" #18612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TomDLT , some minor comments but this looks good
| if not return_distance: | ||
| raise ValueError("return_distance must be True " | ||
| "if sort_results is True.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this validation be done in all cases, and not just with 'brute'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation is already done in the method query_radius for "ball_tree" and "kd_tree", and it is not necessary with a precomputed sparse graph.
| raise ValueError("return_distance must be True " | ||
| "if sort_results is True.") | ||
| for ii in range(len(neigh_dist)): | ||
| order = np.argsort(neigh_dist[ii], kind='mergesort') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, why do we need the sort to be stable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used mergesort to be consistent with the sorting done in _check_precomputed, but I don't remember why we chose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not return_distance: | ||
| raise ValueError("return_distance must be True " | ||
| "if sort_results is True.") | ||
| for ii in range(len(neigh_dist)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can vectorize this operation instead of looping over all samples? IIUC the arrays in neigh_dist may be of different lengths so that might not be possible, but I'm not familiar enough to confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not possible, as you said each row is an array of different length.
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM when green!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well. Thanks @TomDLT.
…m="brute" (scikit-learn#18612) Co-authored-by: Nicolas Hug <[email protected]>
…m="brute" (scikit-learn#18612) Co-authored-by: Nicolas Hug <[email protected]>
Fixes #18583.
Calling
radius_neighborsmethod ofNearestNeighborswas not sorting the result by distance whensort_results=True, when fitted withalgorithm="brute".This PR fixes this issue, and clarifies the docstring.