Skip to content

Add withReshuffle(boolean) option to FileIO.matchAll() #34677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

s3847243
Copy link

What

This PR adds a new method withReshuffle(boolean) to FileIO.matchAll() to allow disabling the automatic reshuffling step (Reshuffle.viaRandomKey()).

Why

Currently, FileIO.matchAll() always applies a Reshuffle step. While this improves performance for wildcard patterns that expand into many files, it is not ideal when processing a large static list of file paths (e.g., 1M+). In such cases, reshuffling can block downstream fusion and autoscaling.

This feature allows advanced users to opt out of reshuffling to improve performance and fusion behavior.

How

  • Added a getReshuffle() property to the MatchAll AutoValue class, defaulting to true.
  • Updated the expand() method to conditionally apply the reshuffle based on the property.
  • Added withReshuffle(boolean) for API access.
  • Updated FileIOTest to verify the reshuffle toggle behavior.

Fixes

Fixes: #33330


  • This addresses an open issue (fixes #33330)
  • Tested and verified the updated behavior
  • I will update CHANGES.md upon approval if necessary

@github-actions github-actions bot added the java label Apr 20, 2025
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant