This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NCS and Zephyr fatal errors

Hello,

In my development process from time to time after adding/changing code I get some Zephyr fatal error. So far I was able to fix all of those errors, but it always took me a while to figure out the problem.

I would like to know what is the best way to handle Zephyr fatal errors?

Is there a systematic approach to figuring out where the error occurres?

Any tips in general for debugging in NCS?

Thanks in advance!

Parents
  • Hello David,

    my apologies for the late reply! We have been out of office for a couple of days due to public holidays and the Norwegian national day.

    Unfortunately, fatal errors aren’t that easy to handle in general. But a good starting point could be to enable the following command in the prj.conf of your application:

    CONFIG_RESET_ON_FATAL_ERROR=n

    Another tip could be debug logging of a certain module, by using the commands:

    CONFIG_LOG=y

    and e.g.

    CONFIG_I2C_LOG_LEVEL_DBG=y

    (see list of all possible configuation options here).

    Last but not least, you could use the RESETREAS register of your CPU to check on reasons for potential resets.

    You can do that via the debugger using the following command (with example adress of the nRF52810):

    nrfjprog --memrd 0x40000400

    I hope this will help you!

    Regards,

    Markus

Reply
  • Hello David,

    my apologies for the late reply! We have been out of office for a couple of days due to public holidays and the Norwegian national day.

    Unfortunately, fatal errors aren’t that easy to handle in general. But a good starting point could be to enable the following command in the prj.conf of your application:

    CONFIG_RESET_ON_FATAL_ERROR=n

    Another tip could be debug logging of a certain module, by using the commands:

    CONFIG_LOG=y

    and e.g.

    CONFIG_I2C_LOG_LEVEL_DBG=y

    (see list of all possible configuation options here).

    Last but not least, you could use the RESETREAS register of your CPU to check on reasons for potential resets.

    You can do that via the debugger using the following command (with example adress of the nRF52810):

    nrfjprog --memrd 0x40000400

    I hope this will help you!

    Regards,

    Markus

Children
  • Hello Markus,

    thank you for the reply and your advise!

    My latest fatal error looked like this:

    <err> os: r0/a1:  0x00000004  r1/a2:  0x0000009a  r2/a3:  0x00000000
    <err> os: r3/a4:  0x2000e8de r12/ip:  0x00000000 r14/lr:  0x0001f877
    <err> os:  xpsr:  0x61000000
    <err> os: Faulting instruction address (r15/pc): 0x0003bc18
    <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
    <err> os: Current thread: 0x20002510 (UARTReceiveThreadID)
    <err> os: Halting system

    What helped me in this case to figure out the problem was to put the faulting instruction address (0x0003bc18) into the disassembly window. The address pointed to the line 45 in <path-to-ncs>/zephyr/lib/os/assert.c. So I put a breakpoint on that line and the parameters of the function assert_post_action() pointed me to the line 154 of <path-to-ncs>/zephyr/lib/os/heap.c. This line told me the actual problem that probably a double-free of heap memory occured.

    I'm not really sure why this information is not in the Zephyr fatal error message, because I am pretty sure that I saw this heap double-free error in my log before.

    Best regards,

    David

Related