Memory corruption while running RADIO_IRQHandler

Hello,

I'm facing an issue where the whole RAM is filled with 12 bytes repeated.

This issue can happen few minutes or even days after startup. Some of the functions called just before crashing are:
RADIO_IRQHandler
received_frame_notify_and_nesting_allow
nrf_802154_rx_buffers
nrf_802154_received_raw
last_rx_frame_timestamp_get
nrf_802154_timer_coord_timestamp_get
__aeabi_ldivmod

I'm using nrf52840 and nrf5 sdk for thread and zigbee v4.2.0.

In order to debug, we have written the following macro that is adding a variable to each interrupt function containing the name of the interrupt itself:

#define ISR_HANDLER_DEF(A,B) \
  extern void B(void); \
  void A(void)\
  {char intPos[]=#A; \
  B();}

ISR_HANDLER_DEF(extIRQ_Handler02, RADIO_IRQHandler);

//from ses_startup_nrf52840:
  .thumb_func
  .weak   extIRQ_Handler02
extIRQ_Handler02:
  b     .

Here is a file containing last bytes of RAM section corresponding to main stack which contains the string "RADIO_IRQHandler":
2287.corruptedMemNordic_until_RAM_segment_end.bin

Following file contains first bytes of RAM section corresponding to interrupt vector (here it has been overwritten):
4807.corruptedMemNordic_from_RAM_segment_start.bin

File containing last bytes of RAM section from a different test:
1157.corruptedMemNordic_until_RAM_segment_end_test2.bin

VECTORS_IN_RAM is define in our project

__RAM_segment_start__ = 0x20000000
__RAM_segment_end__ = 0x20040000

Here are some addresses from disassembly and .map file (previous binary files refer to these addresses):
0x8D1AF extIRQ_Handler02
0x2001DB8C nrf_radio_driver.a(nrf_802154_timer_coord.nosd.o)
0x4665B RADIO_IRQHandler
0x2001DB20 nrf_radio_driver.a(nrf_802154_core.nosd.o)
0x45279 received_frame_notify_and_nesting_allow
0x2001E55C nrf_802154_rx_buffers
0x4665B nrf_802154_received_raw
0x44C31 last_rx_frame_timestamp_get
0x47401 nrf_802154_timer_coord_timestamp_get
0x090871 __aeabi_ldivmod

If you need .map file I can share it if you make ticket as private.

Do you have any suggestion to identify the issue?

Best regards,

Laura

Update:

as result of further investigation we saw that the issue is reproducible if:

  1. We stop the code in __aeabi_ldivmod. Call stack shows the functions listed above including RADIO_IRQHandler
  2. We go step by step and we enter __int64_udiv
  3. When we are in instruction 0x90770, we set registers 2 and 3 to 0 to have a zero division. As a result Z bit in apsr register is true.
  4. PC jumps to 0x9083A
  5. __aeabi_ldivmod and __int64_udiv and then executed recursively till RAM end

It seams that a zero division is causing the issue.

  

Related