Description
Describe the bug
I'm working on an application for the NXP S32K344 and am currently using the MR-CANHUBK3 board for development. After a period of time, the MCU appears to halt/hang. The time it takes to fail is consistent across reboots for a given program. Initially, I thought the issue was with Logging/UART output as it seems to occur faster when that's enabled, but after more testing, it also happens with it disabled.
I've tried to do some debugging with Trace32. After the issue occurs, I'm not able to pause the processor with the break command. If I use the ITM trace with PC sampler, it looks like the cpu hangs on an instruction in the main thread. The instruction it hangs on varries across runs. The cpu continues handling the systick interrupts and then hangs each time it returns to the main thread. Other interrupts such as GPIO interrupts are also handled. After the issue occurs, I'm still able to view the peripheral registers in Trace32 and I've comfirmed that the safety peripherals haven't caused a functional reset.
If I disable threading with CONFIG_MULTITHREADING=n
or disable data cache with CONFIG_DCACHE=n
, the issue doesn't appear to occur. I've run builds with both for over 48 hours without any issue.
To Reproduce
I've created a repo here that consistently reproduces the issue after about a minute. I've also tested with the samples/basic/blinky
sample program and the issue occurs after ~8.5 hours.
Expected behavior
MCU shouldn't halt/hang.
Impact
We're able to work around it by modifying existing drivers to not use threads, but it's a pretty big annoyance.
Logs and console output
[00:01:17.368] hello world 2822
[00:01:17.384] hello world 2823
[00:01:17.416] hello world 2824
[00:01:17.432] hello world 2825
[00:01:17.448] hello world 2826
*stops outputting*
Environment
- OS: Linux
- Toolchain: Zyphyr sdk 0.16.6 (also tested 0.16.9)
- Zephyr version: v3.7.0 (also tested v4.0.0, v4.1.0)
- Code: https://github.com/cmoser-crl/simple-zephyr-example
Additional context
All of the testing was done with the FS26 SBC in Debug mode.
I've tested the same code on a Teensy (MIMXRT1062, also Cortex M7) without seeing this issue.
systick is still handled after the main thread hangs
the systick handler hangs when trying to return to the main thread