Skip to content

[HUDI-9330] Avoid storing the clean plan in inflight clean instant #13192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

TheR1sing3un
Copy link
Member

It is not necessary for hudi to store a clean plan in the inflight clean instant as well. This will increase the storage waste of our metadata directory.
Clean plan can always be fetched from request clean instant.

Change Logs

  1. avoid storing the clean plan in inflight clean instant

Impact

reduce storage waste about timeline

Risk level (write none, low medium or high below)

low

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

1. avoid storing the clean plan in inflight clean instant

Signed-off-by: TheR1sing3un <[email protected]>
@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Apr 21, 2025
1. delete empty clean instant

Signed-off-by: TheR1sing3un <[email protected]>
1. fix test

Signed-off-by: TheR1sing3un <[email protected]>
1. fix test

Signed-off-by: TheR1sing3un <[email protected]>
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@@ -257,7 +257,8 @@ public HoodieCleanMetadata execute() {
}

for (HoodieInstant hoodieInstant : pendingCleanInstants) {
if (table.getCleanTimeline().isEmpty(hoodieInstant)) {
if (table.getCleanTimeline().isEmpty(CleanerUtils.getCleanRequestInstant(table.getMetaClient(), hoodieInstant))) {
// remove the empty instant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to delete the empty inflight file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to delete the empty inflight file?

The original logic here is to delete the inflight file. I think what you should be saying is whether we also need to delete the request file, right?
I originally attempted to make this change, but I found that it would disrupt the existing ut logic, you can refer to my previous commit: 16230a4. To reduce the unknown risks brought by the change, this change only makes alterations to the way the plan is obtained and does not change any other logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to delete the empty inflight file?

Also, we can no longer perform the isEmpty judgment on the inflight file now because this file no longer stores the clean plan. Its size is always 0 and the isEmpty method is always true.

@@ -168,10 +168,18 @@ public static Option<HoodieInstant> getEarliestCommitToRetain(
public static HoodieCleanerPlan getCleanerPlan(HoodieTableMetaClient metaClient, HoodieInstant cleanInstant)
throws IOException {
CleanPlanMigrator cleanPlanMigrator = new CleanPlanMigrator(metaClient);
cleanInstant = getCleanRequestInstant(metaClient, cleanInstant);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the cleanInstant as REQUESTED from the invoker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the cleanInstant as REQUESTED from the invoker?

The current invoker does not care whether the incoming Instant is request or inflight. To be compatible with the current usage logic of invoker, I choose to perform the instant conversion at this method. Moreover, from the design of this method itself, given a clean instant of any state, It is reasonable that we should all be able to return to the correct plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there is possibility an empty inflight instant been removed for 2 times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there is possibility an empty inflight instant been removed for 2 times.

If there is a concurrent cleaner, it is indeed possible, but this behavior has nothing to do with the current code modification

@TheR1sing3un TheR1sing3un requested a review from danny0405 April 23, 2025 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S PR with lines of changes in (10, 100]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants