Skip to content

kernel: fix handling of very large timeouts #90833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 11, 2025

Conversation

mjchen0
Copy link
Contributor

@mjchen0 mjchen0 commented May 29, 2025

When a very large timeout, like INT64_MAX passed by task_wdt_init(), is passed to z_add_timeout(), it
could become a very small negative number when
added to the timeout list. This would prevent any
new timeout from being added correctly.

Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be really great to have test coverage added here.

@peter-mitsis
Copy link
Collaborator

I don't think that the underlying premise is correct. In the case of a very large positive such as INT64_MAX (7fff ffff ffff ffff), Z_IS_TIMEOUT_RELATIVE() is going to evaluate to false as Z_TICK_ABS(INT64_MAX) = INT64_MAX, meaning an absolute timeout. That path does not use the elapsed() + 1 that the comment indicates.

@mjchen0 mjchen0 force-pushed the kernel_timeout_bugfix branch from bc25382 to 8f8508e Compare May 31, 2025 00:13
@mjchen0
Copy link
Contributor Author

mjchen0 commented May 31, 2025

I spent some more time looking at the issue I was running into (which was the task_wdt not working as expected) and I agree it really boiled down to how INT64_MAX was being mishandled when passed to K_TIMEOUT_ABS_TICKS(). It's really not a valid absolute timeout, even though task_wdt.c was passing it in. If it were a literal, the preprocessor would have generated a warning/error at compile time, but because task_wdt() was passing in a variable, it compiled without any warning/error.

I've changed my initial fix to modify K_TIMEOUT_ABS_TICKS() to limit the maximum value to INT64_MAX-1, since that is the largest valid absolute timeout. It's a silent limit being enforced, but given the range of int64_t, I don't think anyone will see any practical difference.

My CL also has a fix for INT64_MAX relative timeout being falsely reported as not a relative timeout by the Z_IS_RELATIVE_TIMEOUT() macro.

Some tests for these cases were added to the kernel timer_api unit tests.

@mjchen0 mjchen0 force-pushed the kernel_timeout_bugfix branch from 8f8508e to ad9aca8 Compare May 31, 2025 00:27
@mjchen0 mjchen0 force-pushed the kernel_timeout_bugfix branch from ad9aca8 to 2c8f832 Compare June 2, 2025 20:14
Copy link
Collaborator

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks, but no major complaints. The test looks like a valuable addition

@mjchen0 mjchen0 force-pushed the kernel_timeout_bugfix branch 2 times, most recently from 9a87603 to 58599c1 Compare June 3, 2025 22:00
Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly certain the rust test failure is unrelated

When CONFIG_TIMEOUT_64BIT is y, positive values are relative/delta
timeouts and negative values are absolute timeouts, except
for two special values. -1 is K_WAIT_FOREVER and 0 is K_NO_WAIT.
The reserved value of -1 means INT64_MAX is not a valid argument
to K_TIMEOUT_ABS_TICKS(), but there was no check. If a literal
was passed, a preprocessor/compiler warning would be generated
for overflow, but if a variable was passed as the argument,
then the code would compile but not work correctly since the
absolute timeout would be changed to a relative one. One
example of this is task_wdt_init() if no channels are enabled.

Rather than just fixing task_wdt, and trying to find other cases
in an adhoc way, this CL changes K_TIMEOUT_ABS_TICKS() to
limit the larges value to (INT64_MAX-1). It does so silently,
but given the range of int64_t, there should be no practical
difference.

Also, change the implementation for Z_IS_TIMEOUT_RELATIVE() to
fix the case where INT64_MAX relative timeout was being
improperly reported as being not a relative timeout. This was
again due to the -1 reserved value.

Add some tests for these changes to the timer_api test.

Signed-off-by: Mike J. Chen <[email protected]>
@mjchen0 mjchen0 force-pushed the kernel_timeout_bugfix branch from 58599c1 to 95b02fb Compare June 10, 2025 22:36
Copy link

@kartben kartben merged commit 9075d53 into zephyrproject-rtos:main Jun 11, 2025
26 checks passed
@mjchen0 mjchen0 deleted the kernel_timeout_bugfix branch June 11, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants