-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[HUDI-9330] Avoid storing the clean plan in inflight clean instant #13192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
1. avoid storing the clean plan in inflight clean instant Signed-off-by: TheR1sing3un <[email protected]>
1. delete empty clean instant Signed-off-by: TheR1sing3un <[email protected]>
1. fix test Signed-off-by: TheR1sing3un <[email protected]>
1. fix test Signed-off-by: TheR1sing3un <[email protected]>
@@ -257,7 +257,8 @@ public HoodieCleanMetadata execute() { | |||
} | |||
|
|||
for (HoodieInstant hoodieInstant : pendingCleanInstants) { | |||
if (table.getCleanTimeline().isEmpty(hoodieInstant)) { | |||
if (table.getCleanTimeline().isEmpty(CleanerUtils.getCleanRequestInstant(table.getMetaClient(), hoodieInstant))) { | |||
// remove the empty instant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to delete the empty inflight file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to delete the empty inflight file?
The original logic here is to delete the inflight file. I think what you should be saying is whether we also need to delete the request file, right?
I originally attempted to make this change, but I found that it would disrupt the existing ut logic, you can refer to my previous commit: 16230a4. To reduce the unknown risks brought by the change, this change only makes alterations to the way the plan is obtained and does not change any other logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to delete the empty inflight file?
Also, we can no longer perform the isEmpty
judgment on the inflight file now because this file no longer stores the clean plan. Its size is always 0 and the isEmpty
method is always true.
@@ -168,10 +168,18 @@ public static Option<HoodieInstant> getEarliestCommitToRetain( | |||
public static HoodieCleanerPlan getCleanerPlan(HoodieTableMetaClient metaClient, HoodieInstant cleanInstant) | |||
throws IOException { | |||
CleanPlanMigrator cleanPlanMigrator = new CleanPlanMigrator(metaClient); | |||
cleanInstant = getCleanRequestInstant(metaClient, cleanInstant); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make the cleanInstant
as REQUESTED
from the invoker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make the
cleanInstant
asREQUESTED
from the invoker?
The current invoker does not care whether the incoming Instant is request or inflight. To be compatible with the current usage logic of invoker, I choose to perform the instant conversion at this method. Moreover, from the design of this method itself, given a clean instant of any state, It is reasonable that we should all be able to return to the correct plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so there is possibility an empty inflight instant been removed for 2 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so there is possibility an empty inflight instant been removed for 2 times.
If there is a concurrent cleaner, it is indeed possible, but this behavior has nothing to do with the current code modification
It is not necessary for hudi to store a clean plan in the inflight clean instant as well. This will increase the storage waste of our metadata directory.
Clean plan can always be fetched from request clean instant.
Change Logs
Impact
reduce storage waste about timeline
Risk level (write none, low medium or high below)
low
Documentation Update
none
Contributor's checklist