Add withReshuffle(boolean) option to FileIO.matchAll() #34677
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
This PR adds a new method
withReshuffle(boolean)
toFileIO.matchAll()
to allow disabling the automatic reshuffling step (Reshuffle.viaRandomKey()
).Why
Currently,
FileIO.matchAll()
always applies aReshuffle
step. While this improves performance for wildcard patterns that expand into many files, it is not ideal when processing a large static list of file paths (e.g., 1M+). In such cases, reshuffling can block downstream fusion and autoscaling.This feature allows advanced users to opt out of reshuffling to improve performance and fusion behavior.
How
getReshuffle()
property to theMatchAll
AutoValue class, defaulting totrue
.expand()
method to conditionally apply the reshuffle based on the property.withReshuffle(boolean)
for API access.FileIOTest
to verify the reshuffle toggle behavior.Fixes
Fixes: #33330
fixes #33330
)CHANGES.md
upon approval if necessary