Device crash when doing a software reset

I'm developing a software with the nRF Connect SDK v2.5.1. This software runs Matter on the nRF5340.

I need to reset the MCU from my app. To do so, I use the sys_reboot(SYS_REBOOT_COLD) function. But this makes the OS crash with the following trace:

Is there a way to make it better ?

Parents
  • Based on the limited info provided, it seems like the main thread might have a stack overflow.

    Try increasing the CONFIG_MAIN_STACK_SIZE in your prj.conf and see if the assert goes away. 

    You need to understand all the contexts (RTOS threads and interrupts) on our system and get an overview on the memory usage by them. While prototyping it might be a good idea to enable THREAD_ANALYZER

  • I already tried to increase the main stack size and the work queue size, but it doesn't solve the issue.

    If it can help, the issue happens only when the device is commissioned and bonded in a Matter over Thread fabric. If the device is "offline", the reset procedure goes fine.

  • Have you enabled the Thread analyzer? Have you seen if there are any other threads that are using closer to its stack limit?  Can you post your Thread analyzer output just before this hardfault happened? If the stack looks good then we can look past the stack overflow and see what caused this hardfault.

  • I enabled it now with CONFIG_THREAD_ANALYZER and CONFIG_THREAD_ANALYZER_AUTO.

    Here is the log before it dies:

  • The nrf5_rx stack size (CONFIG_IEEE802154_NRF5_RX_STACK_SIZE) and ot_radio_workq stack size (CONFIG_OPENTHREAD_RADIO_WORKQUEUE_STACK_SIZE) seems a bit suspicious as well. Can you increase that aswell and see if it is the same behavior? If Yes, Then can you give steps to reproduce.

  • The changes doesn't improve the result, here is the log with thread analysis a few seconds before the crash.

    I think that to reproduce, you can use the Matter light bulb sample on a nRF5340DK, commission it into a Matter over Thread fabric then trigger the reset from a command or such.

  • This could be related to this and this errata. It is possible that the appcore is resetting but the netcore is not and the serial communication with the app and the netcore is falling apart. Try implementing the workarounds for that erratas at the init code and see if that helps.

Reply
  • This could be related to this and this errata. It is possible that the appcore is resetting but the netcore is not and the serial communication with the app and the netcore is falling apart. Try implementing the workarounds for that erratas at the init code and see if that helps.

Children