How to generate full thread stack trace to log on crash

Using Zephyr (cf2149caf2) / nrfSDKConnect on a nrf5340DK board.

I need to know how to get a direct stack trace (with file/line numbers if possible for a debug build).

Currently I only get this type of log output:

[00:51:03.271,972] <err> os: ***** BUS FAULT *****
[00:51:03.277,435] <err> os: Precise data bus error
[00:51:03.283,172] <err> os: BFAR Address: 0x0
[00:51:03.288,482] <err> os: r0/a1: 0x00000000 r1/a2: 0x00033050 r2/a3: 0x00000064
[00:51:03.297,210] <err> os: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x000280cb
[00:51:03.305,877] <err> os: xpsr: 0x61000000
[00:51:03.311,126] <err> os: Faulting instruction address (r15/pc): 0x00030804
[00:51:03.319,030] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
[00:51:03.326,965] <err> os: Current thread: 0x20003990 (unknown)

Finding the function that is related to the instruction address using the build/zephyr/zephyr.map is simple enough, but generally not useful (eg here its in strcpy()...). 

Is there a way to get the fault handler to dump the full stack for the active thread? This would be very useful in release code to generate a useful error report (where one can't just attach the debugger!)

(I can get a full core dump, but this is too large for easy recovery in a log, and doesn't give a stack unwind trace directly either)

I see other questions about this on the forum, but haven't seen an answer that provides the stack trace... 

Parents
  • This has been discussed a lot in this forum and few config options were given (for example) here and here

    Can you try something like below

    # Debugging configuration
    CONFIG_THREAD_NAME=y
    CONFIG_THREAD_ANALYZER=y
    CONFIG_THREAD_ANALYZER_AUTO=y
    CONFIG_THREAD_ANALYZER_RUN_UNLOCKED=y
    CONFIG_THREAD_ANALYZER_USE_PRINTK=y
    
    # Add asserts
    CONFIG_ASSERT=y
    CONFIG_ASSERT_VERBOSE=y
    CONFIG_ASSERT_NO_COND_INFO=n
    CONFIG_ASSERT_NO_MSG_INFO=n
    CONFIG_RESET_ON_FATAL_ERROR=n
    CONFIG_THREAD_NAME=y
    CONFIG_STACK_SENTINEL=y

  • Its been discussed a lot, but I don't see any answer that shows how to move from the basic log output to one with (at least) a full stack trace for the thread that caused the error.

    I  already used the thread analyzer setup (just was missing the thread name option)

    With the options you suggest, except for CONFIG_RESET_ON_FATAL_ERROR=n, a crash now gives

    [00:14:33.928,344] <inf> base: DV : request to render current page systest2.1
    * buffer overflow detected *
    [00:14:33.944,885] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000001
    [00:14:33.953,582] <err> os: r3/a4: 0x2000fa29 r12/ip: 0x0000000a r14/lr: 0x000208eb
    [00:14:33.962,280] <err> os: xpsr: 0x61000000
    [00:14:33.967,529] <err> os: Faulting instruction address (r15/pc): 0x000143a6
    [00:14:33.975,463] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    [00:14:33.983,123] <err> os: Current thread: 0x20003a68 (sysworkq)

    ie compared to my previous log, only the name of the current thread is shown as well. Still no stack trace/unwind as desired... any more things I can try?

    BTW, I have not yet used:

    CONFIG_RESET_ON_FATAL_ERROR=n 

    as this causes the function sys_reboot() to be not found by the linker (which I use to... reboot...). This seems odd - any ideas why?

  • Easiest is to open a command prompt in your project folder an ddo 

    path_to_gnuarmemb/bin/arm-none-eabi-addr2line -e build/zephyr/zephyr.elf -a 0x000143a6

    You would then get the exact context of your fault.

    BrianW said:

    CONFIG_RESET_ON_FATAL_ERROR=n 

    as this causes the function sys_reboot() to be not found by the linker (which I use to... reboot...). This seems odd - any ideas why?

    That seems odd, yes. 
    The main use of this config is where to pull the fatal_error.c file in or not as seen inside the file nrf\lib\fatal_error\CMakeLists.txt. I did not think that it had any other dependencies or side effects of setting it to "n"

  • That seems odd, yes. 
    The main use of this config is where to pull the fatal_error.c file in or not as seen inside the file nrf\lib\fatal_error\CMakeLists.txt. I did not think that it had any other dependencies or side effects of setting it to "n"

    if you have  

    CONFIG_RESET_ON_FATAL_ERROR=n
    then you also need to explicitly define
    CONFIG_REBOOT=y
    The fatal error then indeed halts the system, rather than rebooting it.
    However, this still does NOT produce a full stack trace for the offending thread.
    Is there a way do do this stack unwind for a log? Otherwise knowing it failed in strcpy or another lib function is not very helpful to know how it got there... (which is all the addr2line untility gives you!)
    Is this really impossible to do on the device?? gdb knows how to do it (but of course you have to have a gbd server connected to be able to get that!)
  • Normally the hardfault handler unwinds the last stack frame for you and gives you detail of the instruction causing fault. How did you compile your project? Did you compile it adding the debug symbols? 

    I am always using Visual Studio code at my end these days so when prototyping I always choose "Optimize for debugging (-Og)" for the whole project. You can do this in your cmake files aswell.

    Also make sure you have CONFIG_DEBUG and CONFIG_DEBUG_INFO added to your prj.conf

  • I compile with CONFIG_DEBUG=y, CONFIG_DEBUG_INFO=y, and I see the -Og in the gcc options.

    Normally the hardfault handler unwinds the last stack frame for you and gives you detail of the instruction causing fault.

    Do you mean that you see a full call stack? or just the instruction pointer for the function with the fault (which is not helpful if its strlen or sprintf!)

    I'm looking for something like:

    fault @ strlen, called by sprintf, called by my_code_fn, called by my_other_fn, called by task_my_thread.

    Even if the function names are not printed but only the addresses....

    btw, addr2line gave me nothing:

    C:\work\dev\if-device-nrf53>arm-none-eabi-addr2line -e cc1-med/build/zephyr/zephyr.elf -a 0x3fda3
    0x0003fda3
    ??:?

     But a find in the cc1-med/build/zephyr/zephyr.map works to find the function for the address.... just not so scriptable...

  • BrianW said:
    fault @ strlen, called by sprintf, called by my_code_fn, called by my_other_fn, called by task_my_thread.

    No, the fault handler does not give the full call trace like gdb does. for getting a context function call trace, I start the debugger and set the breakpoint in the hardfault handler. If everything goes right and if the stack memory is not corrupted, then some of the IDE's debuggers show the call trace or setup GDB session if that is your preference. 

    BrianW said:
    C:\work\dev\if-device-nrf53>arm-none-eabi-addr2line -e cc1-med/build/zephyr/zephyr.elf -a 0x3fda3
    0x0003fda3
    ??:?

    Hmm, try to add "-f" option to attempt to get function names. 

Reply
  • BrianW said:
    fault @ strlen, called by sprintf, called by my_code_fn, called by my_other_fn, called by task_my_thread.

    No, the fault handler does not give the full call trace like gdb does. for getting a context function call trace, I start the debugger and set the breakpoint in the hardfault handler. If everything goes right and if the stack memory is not corrupted, then some of the IDE's debuggers show the call trace or setup GDB session if that is your preference. 

    BrianW said:
    C:\work\dev\if-device-nrf53>arm-none-eabi-addr2line -e cc1-med/build/zephyr/zephyr.elf -a 0x3fda3
    0x0003fda3
    ??:?

    Hmm, try to add "-f" option to attempt to get function names. 

Children
  • No, the fault handler does not give the full call trace like gdb does. for getting a context function call trace, I start the debugger and set the breakpoint in the hardfault handler. If everything goes right and if the stack memory is not corrupted, then some of the IDE's debuggers show the call trace or setup GDB session if that is your preference. 

    Ok. 

    Does this means there is definitively no way to get a call trace in the firmware itself?

Related