-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[HUDI-9206] Support reading inflight instants with HoodieLogRecordReader #13010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test for this new arg w/ the FGR
@lokeshj1703 , the context is not clear. Can you explain more on "since RLI reads the log records and finds deleted keys."? |
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java
Outdated
Show resolved
Hide resolved
...park/src/test/scala/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderOnSpark.scala
Outdated
Show resolved
Hide resolved
commitToTable(dataGen.generateInserts("001", 100), INSERT.value, writeConfigs) | ||
validateOutputFromFileGroupReader(getStorageConf, getBasePath, dataGen.getPartitionPaths, true, 0, recordMergeMode) | ||
|
||
commitToTable(dataGen.generateUniqueUpdates("003", 100), UPSERT.value, writeConfigs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you prepare the table instead of transactions? Also, could this validation be added to existing tests, i.e., adding a new step in existing tests to remove completed detlacommit from the timeline so that data files become inflight and should not be read? So that we don't need this new test and we can have more coverage with different permutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a UT now. This test has been removed.
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add more doc on the flag. The inflight data files should never be seen based on reader side from SI trasanction semantics.
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java
Show resolved
Hide resolved
@danny0405 : are you good w/ the patch. If yes, can we go ahead and land the patch. |
I'm fine with the change. |
We are going ahead for now to get this into 1.0.2. Feel free to review this async. author can address it in a follow up patch.
…der (apache#13010) (cherry picked from commit 1f52b4e)
…der (apache#13010) (cherry picked from commit 1f52b4e)
…der (apache#13010) (cherry picked from commit 1f52b4e)
…der (apache#13010) (cherry picked from commit 1f52b4e)
…der (apache#13010) (cherry picked from commit 1f52b4e)
Change Logs
The PR adds capability to read inflight instants with HoodieLogRecordReader by adding
allowInflightInstants
flag. The capability is required for updating RLI since RLI reads the log records and finds deleted keys.While replacing
HoodieMergedLogRecordScanner
with new reader class, we need to use the allowInflightInstants flag.Impact
NA
Risk level (write none, low medium or high below)
low
Documentation Update
NA
Contributor's checklist