nRF9160 Help needed to find reset / reboot cause

Hi,

I'm having issues debugging a case where the device is reset / rebooted every 1-2 days on average for some reason without any error message as to why. If any error is encountered during startup or the main loop sys_reboot(0) is called but I'm always logging any error reason so unless it's dropping all the log messages before rebooting I don't see this as the reason for unexpected reboots.

Simplified application startup details:
1. Initializing modem with nrf_modem_lib_init(NORMAL_MODE)
2. Starting watchdog
3. Initializing AT commands and notifications | at_cmd_init() & at_notif_init
4. Connecting to the network | NB-IoT
5. Starting sensor worker | bme680 with bsec lib
6. Initializing and requesting PSM

Simplified main loop:
1. Read sensor on set interval
2. Sends the sensor data to the server using DTLS based on the negotiated PSM paramters, the only thing that can change during runtime is that new PSM paramters can be negotiated by the network.
3. Watchdog feed

For debugging I've enabled logging for the different modules and I'm using the thread analyzer to see if it's leaking memory, etc. I have also set CONFIG_RESET_ON_FATAL_ERROR=n  and removed the NVIC_SystemReset() from the app_error_weak.c but it still resets.

When getting the reset reason with nrf_power_resetreas_get(NRF_POWER) I get RESETPIN (NRF_POWER_RESETREAS_RESETPIN_MASK)

Do you have any additional debugging options I could enable to hopefully get some more insight into why it might reset?

Would a modem trace provide some details or is it only used for analyzing the communication between the device and the LTE network?

spm.conf

CONFIG_IS_SPM=y
CONFIG_FW_INFO=y
CONFIG_GPIO=y
CONFIG_SERIAL=y
CONFIG_INIT_ARCH_HW_AT_BOOT=y

prj.conf

CONFIG_NEWLIB_LIBC=y
CONFIG_NEWLIB_LIBC_FLOAT_PRINTF=y

CONFIG_NRF_MODEM_LIB=y
CONFIG_NRF_MODEM_LIB_SYS_INIT=n

CONFIG_NETWORKING=y
CONFIG_NET_SOCKETS=y
CONFIG_NET_SOCKETS_POSIX_NAMES=y
CONFIG_NET_NATIVE=n

CONFIG_LTE_LINK_CONTROL=y
CONFIG_LTE_NETWORK_MODE_NBIOT=y
CONFIG_LTE_AUTO_INIT_AND_CONNECT=n

CONFIG_MAIN_STACK_SIZE=8192
CONFIG_HEAP_MEM_POOL_SIZE=4096
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048

CONFIG_BOOTLOADER_MCUBOOT=y

CONFIG_LOG=y
CONFIG_LOG_IMMEDIATE=y

CONFIG_SETTINGS=y
CONFIG_I2C=y

CONFIG_FPU=y
CONFIG_FP_HARDABI=y

CONFIG_SNTP=y
CONFIG_REBOOT=y

CONFIG_WATCHDOG=y

CONFIG_THREAD_ANALYZER=y
CONFIG_RESET_ON_FATAL_ERROR=n
CONFIG_DEBUG=n

CONFIG_SERIAL=y

Device and firmware details:
Thingy91 / nRF9160 SiP Revision 1
nRF Connect SDK: 1.5.1
Modem FW: 1.2.7

Br,
Patrik

  • Hello Patrik, 

    When getting the reset reason with nrf_power_resetreas_get(NRF_POWER) I get RESETPIN (NRF_POWER_RESETREAS_RESETPIN_MASK)

    Interestingly enough, the Thingy91 has no reset pin connected by default. Have you done any modifications to the board, maybe? 

    Would a modem trace provide some details or is it only used for analyzing the communication between the device and the LTE network?

    A modem trace helps to analyse what the modem does, but even though it would crash it should not cause the application core to crash as well, at least not right away.

    Do you have any additional debugging options I could enable to hopefully get some more insight into why it might reset?

    What I probably would do at this point is to enable debug logging for the libraries/modules your application is using, like e.g. 

    Maybe they can provide you with some more information. You probably shouldn’t enable them all at once though, as the output can be huge. 

    Regards,

    Markus

  • Hi,

    Interestingly enough, the Thingy91 has no reset pin connected by default. Have you done any modifications to the board, maybe? 

    No hardware modifications has been done, it's straight out of the box. Anyhow I see now in the documentation regarding RESETREAS register that it should be cleared, which I haven't been doing.

    What I probably would do at this point is to enable debug logging for the libraries/modules your application is using, like e.g. 

    I'll enable some more logging and get back to you.

    /Patrik

  • Integrating Memfault may be helpful for this type of problems. The library installs a fault handler that analyses the fault and stores the result in a RAM section that is not overwritten by the reboot. After reboot the data can be uploaded to the Memfault backend and analysed.

  • Integrating Memfault may be helpful for this type of problems. The library installs a fault handler that analyses the fault and stores the result in a RAM section that is not overwritten by the reboot. After reboot the data can be uploaded to the Memfault backend and analysed.

    Hi, I'll have a look into Memfault unfortunately it seems like it's not available straight out of the box for nRF Connect SDK: 1.5.1, it looks like it was added in 1.6.0. Anyhow I don't think it would be much work to just add the specific parts I need like the fault handler.

  • Hi Markus,

    So, I've done some more testing with the extra logging enabled, the first time it ran for about 5 days and the other time about 1 day before resetting. Unfortunately nothing related to the error or any hints of the reset was outputted before the reset. Still getting RESETPIN as reset reason.

    I have another Thingy91 that I'll do the same test on and see if i get the same issue in case it could be some hardware related issue or something.

    Do you have any other suggestions?

    /Patrik

Related