Skip to content

[MINOR] Fix flaky test testCreateNewInstantTimes #13032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 26, 2025

Conversation

yihua
Copy link
Contributor

@yihua yihua commented Mar 26, 2025

Change Logs

As above.

Impact

Fixes test flakiness.

Risk level

none

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:XS PR with lines of changes in <= 10 label Mar 26, 2025
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 sec is not sufficient is it? :(

@yihua
Copy link
Contributor Author

yihua commented Mar 26, 2025

10 sec is not sufficient is it? :(

Seems like it. I've added logs also in case it fails again so we know the exact instant times from the test.

@apache apache deleted a comment from hudi-bot Mar 26, 2025
Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test is passing. Curious about why generating time is taking long.

@@ -147,7 +147,9 @@ public void testCreateNewInstantTimes() throws IOException {
String newInstantTime = metaClient.createNewInstantTime(false);
assertTrue(!instantTimesSoFar.contains(newInstantTime));
instantTimesSoFar.add(newInstantTime);
assertTrue((Long.parseLong(newInstantTime) - Long.parseLong(newCommitTimeInUTC)) < 10000L);
assertTrue((Long.parseLong(newInstantTime) - Long.parseLong(newCommitTimeInUTC)) < 60000L,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is InProcess lock taking so long?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added corresponding comment below #13032 (comment).

@yihua
Copy link
Contributor Author

yihua commented Mar 26, 2025

I rebased the branch on top of the latest master with CI improvements.

@yihua yihua force-pushed the MINOR-fix-flaky branch from 77bdaf1 to dcaba03 Compare March 26, 2025 21:47
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua
Copy link
Contributor Author

yihua commented Mar 26, 2025

Flink IT failure is irrelevant.

@yihua yihua merged commit 4f63cc0 into apache:master Mar 26, 2025
59 of 60 checks passed
@geserdugarov
Copy link
Contributor

geserdugarov commented Mar 27, 2025

@yihua , @codope , @nsivabalan , @danny0405 , I've also spent some time on this flaky test before I found this PR. I propose to clarify the situation with this UT by merge of #13042

I've added clarification in the comment to this test in mentioned PR:

// Note, that `metaClient` is initialized here with default `HoodieTimeGeneratorConfig#timeGeneratorConfig`,
// which means that `HoodieTimeGeneratorConfig#MAX_EXPECTED_CLOCK_SKEW_MS` with default value will be used (probably, it's 200 ms).
// It leads to corresponding thread sleep in `SkewAdjustingTimeGenerator::generateTime` for each call
// after all preliminary initialization costs in the following loop, so we don't want to iterate a lot here.

and reduced number of iterations and max permitted diff.

The failure could be easily reproduced locally by changing to 10 s max and i < 100. After I've found the reason, it was clear that it would be enough to pass HoodieTimeGeneratorConfig#MAX_EXPECTED_CLOCK_SKEW_MS or explicit set of in-process lock provider to HoodieTableMetaClient::build to support large number of iterations. But I couldn't configure properly this case from TestHoodieTableMetaClient, because of current test classes inheritance. And also we don't need to retry so much in a unit test.

I've checked locally, that if I change default value of HoodieTimeGeneratorConfig#MAX_EXPECTED_CLOCK_SKEW_MS to 1 ms, then I could set i < 100 without any failures.

@geserdugarov
Copy link
Contributor

geserdugarov commented Mar 27, 2025

Interesting, that correct direction for research was revealed by profiling:

thread sleep

Not all runs were caught in this jfr, but it's obvious that we just sleep periodically.

voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 8, 2025
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 9, 2025
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 15, 2025
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-1.0.2 size:XS PR with lines of changes in <= 10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants