Skip to content

Add withReshuffle(boolean) option to FileIO.matchAll() #34677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

s3847243
Copy link

What

This PR adds a new method withReshuffle(boolean) to FileIO.matchAll() to allow disabling the automatic reshuffling step (Reshuffle.viaRandomKey()).

Why

Currently, FileIO.matchAll() always applies a Reshuffle step. While this improves performance for wildcard patterns that expand into many files, it is not ideal when processing a large static list of file paths (e.g., 1M+). In such cases, reshuffling can block downstream fusion and autoscaling.

This feature allows advanced users to opt out of reshuffling to improve performance and fusion behavior.

How

  • Added a getReshuffle() property to the MatchAll AutoValue class, defaulting to true.
  • Updated the expand() method to conditionally apply the reshuffle based on the property.
  • Added withReshuffle(boolean) for API access.
  • Updated FileIOTest to verify the reshuffle toggle behavior.

Fixes

Fixes: #33330


  • This addresses an open issue (fixes #33330)
  • Tested and verified the updated behavior
  • I will update CHANGES.md upon approval if necessary

@github-actions github-actions bot added the java label Apr 20, 2025
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 19, 2025
Copy link
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: FileIO.matchAll() injects a Reshuffle step which in some case is not useful and might break desirable fusion with more CPU intensive steps
1 participant