Skip to content

[HUDI-9219] Add callbacks for CloseableIterators and eager close data readers #13178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown the-other-tim-brown commented Apr 19, 2025

Change Logs

  • Adds a helper for adding a task completion listener to close iterators if applicable
  • Ensure HoodieData abstractions are calling close when required
  • Add listener in spark code that does not use the HoodieData abstraction
  • Eager close the file readers when there is no more data to read

Impact

  • Ensures the readers are being closed as soon as they are no longer needed. This will help with memory usage and avoid memory leaks
  • Making sure close is called at the HoodieData level also helps ensure that the resources are closed, especially in the face of an error

Risk level (write none, low medium or high below)

Low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Apr 19, 2025
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Apr 20, 2025
@the-other-tim-brown the-other-tim-brown changed the title [DRAFT] Address issues for closeable iterators [HUDI-9219] Add callbacks for CloseableIterators and eager close data readers Apr 20, 2025
@the-other-tim-brown the-other-tim-brown marked this pull request as ready for review April 21, 2025 00:17
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. just 1 minor suggestion.
good job chasing down all the resource closing gaps.

private final Iterator<T> inner;
private boolean isClosed = false;

public WrappedIterator(Iterator<T> inner) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hwo about CloseValidationIterator

Copy link
Contributor Author

@the-other-tim-brown the-other-tim-brown Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@@ -628,6 +628,8 @@ public boolean hasNext() {
try {
// NOTE: This is required for idempotency
if (eof) {
// eagerly close
close();
Copy link
Contributor

@danny0405 danny0405 Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have these eagerly close, do we still need to invoke the Iterator.close outside? Looks the logic is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps proactively free up resources if all elements are read, especially in places that try with resources was missed or if the close is in a later callback. If the iterator is closed without going through all of the elements, it will hit the close method to ensure that the input stream is still closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, understand, but eager close is not often used in code, I suggest we remove it and unify the code to AutoClosable and try-resource for resource releasing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issue in doing both of these things. What is the main concern for this added level of safety?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterator is designed to be no side effect actually, we breaks it already for introducing ClosableIterator, and not we breaks the rule silently during the iteration which I think is not a very good convention to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L PR with lines of changes in (300, 1000]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants