This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Gazell Flywheel TIMER2_IRQHandler HardFault

Problem:

We trigger HardFaults reproducibly. By manually unwinding the stack we have traced the source of the error back to the TIMER2_IRQHandler in Gazell. We can see that the TIMER2_IRQHandler reads a callback pointer that sometimes is NULL, but no NULL-checks are made right before branching, resulting in a HardFault when PC becomes 0x00000000.

Memory address that the TIMER2_IRQHandler reads the callback from is 0x20001ed0. According to *.map file this is:

.bss 0x20001ed0 0x8 ./../../sdk_10/components/properitary_rf/gzll/gcc/gzll_gcc.a(nrf_flywheel.o)

By observing the content in the 0x20001ed0 address through a debugger we can see that the content in that memory address frequently changes between 0x0000b421 and 0x00000000. Which normally does not seem to result in any errors. According to the *.map file 0x0000b421 (0x0000b420) corresponds to:

0000b420 nrf_sm_execute

However in our application the following sequence triggers the HardFault with 100% reproducibility:

  1. Pair device and host.
  2. Start device data transfer at 150 packets per second (~25 bytes per packet).
  3. Cut power to the host
  4. Wait for some time, directly correlated to the nrf_gzll_set_sync_lifetime(). During the waiting the device continues to attempt transmitting data.
  5. Device finally HardFaults in above described location.

We have observed that on sync_lifetime expiration the fucntion sm_stop() is called, which in turn calls nrf_flw_stop(), which we suspect clears the flywheel callback pointer in 0x20001ed0.

However, a "while later" the TIMER2_IRQHandler is invoked again for some reason, but in this case the flywheel callback pointer is NULL.

We have also observed that the TIMER2_IRQHandler actually executes NRF_ASSERT_INTERNAL_parse_and_forward() when the flywheel callback pointer is NULL, but it still continues on and finally branches to the NULL pointer.

Please explain how we can avoid the obve described problem.

Regards, Pablo

Related