Skip to content

[HUDI-9330] Avoid storing the clean plan in inflight clean instant #13192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 24, 2025

Conversation

TheR1sing3un
Copy link
Member

It is not necessary for hudi to store a clean plan in the inflight clean instant as well. This will increase the storage waste of our metadata directory.
Clean plan can always be fetched from request clean instant.

Change Logs

  1. avoid storing the clean plan in inflight clean instant

Impact

reduce storage waste about timeline

Risk level (write none, low medium or high below)

low

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

1. avoid storing the clean plan in inflight clean instant

Signed-off-by: TheR1sing3un <[email protected]>
@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Apr 21, 2025
1. delete empty clean instant

Signed-off-by: TheR1sing3un <[email protected]>
1. fix test

Signed-off-by: TheR1sing3un <[email protected]>
1. fix test

Signed-off-by: TheR1sing3un <[email protected]>
@@ -257,7 +257,8 @@ public HoodieCleanMetadata execute() {
}

for (HoodieInstant hoodieInstant : pendingCleanInstants) {
if (table.getCleanTimeline().isEmpty(hoodieInstant)) {
if (table.getCleanTimeline().isEmpty(CleanerUtils.getCleanRequestInstant(table.getMetaClient(), hoodieInstant))) {
// remove the empty instant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to delete the empty inflight file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to delete the empty inflight file?

The original logic here is to delete the inflight file. I think what you should be saying is whether we also need to delete the request file, right?
I originally attempted to make this change, but I found that it would disrupt the existing ut logic, you can refer to my previous commit: 16230a4. To reduce the unknown risks brought by the change, this change only makes alterations to the way the plan is obtained and does not change any other logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to delete the empty inflight file?

Also, we can no longer perform the isEmpty judgment on the inflight file now because this file no longer stores the clean plan. Its size is always 0 and the isEmpty method is always true.

@@ -168,10 +168,18 @@ public static Option<HoodieInstant> getEarliestCommitToRetain(
public static HoodieCleanerPlan getCleanerPlan(HoodieTableMetaClient metaClient, HoodieInstant cleanInstant)
throws IOException {
CleanPlanMigrator cleanPlanMigrator = new CleanPlanMigrator(metaClient);
cleanInstant = getCleanRequestInstant(metaClient, cleanInstant);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the cleanInstant as REQUESTED from the invoker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the cleanInstant as REQUESTED from the invoker?

The current invoker does not care whether the incoming Instant is request or inflight. To be compatible with the current usage logic of invoker, I choose to perform the instant conversion at this method. Moreover, from the design of this method itself, given a clean instant of any state, It is reasonable that we should all be able to return to the correct plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there is possibility an empty inflight instant been removed for 2 times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there is possibility an empty inflight instant been removed for 2 times.

If there is a concurrent cleaner, it is indeed possible, but this behavior has nothing to do with the current code modification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check all the callers of it, I see several places have already do some pre-transformation to make sure the instant is a REQUESTED before calling it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check all the callers of it, I see several places have already do some pre-transformation to make sure the instant is a REQUESTED before calling it.

do we remove the unnecessary requested instant creation before the invocation of the method.

@TheR1sing3un TheR1sing3un requested a review from danny0405 April 23, 2025 02:27
HoodieCleanerPlan cleanerPlan = metaClient.getActiveTimeline().readCleanerPlan(cleanInstant);
return cleanPlanMigrator.upgradeToLatest(cleanerPlan, cleanerPlan.getVersion());
}

public static HoodieInstant getCleanRequestInstant(HoodieTableMetaClient metaClient, HoodieInstant cleanInstant) {
if (cleanInstant.isInflight() || cleanInstant.isCompleted()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can check if it is not REQUESTED instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can check if it is not REQUESTED instead.

Done~

1. minor refactor

Signed-off-by: TheR1sing3un <[email protected]>
@TheR1sing3un TheR1sing3un requested a review from danny0405 May 22, 2025 12:26
@TheR1sing3un
Copy link
Member Author

@danny0405 Hi, Danny, is there anything else in this pr that needs to be reviewed?

@TheR1sing3un
Copy link
Member Author

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 7af080a into apache:master May 24, 2025
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S PR with lines of changes in (10, 100]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants