How to generate full thread stack trace to log on crash

Using Zephyr (cf2149caf2) / nrfSDKConnect on a nrf5340DK board.

I need to know how to get a direct stack trace (with file/line numbers if possible for a debug build).

Currently I only get this type of log output:

[00:51:03.271,972] <err> os: ***** BUS FAULT *****
[00:51:03.277,435] <err> os: Precise data bus error
[00:51:03.283,172] <err> os: BFAR Address: 0x0
[00:51:03.288,482] <err> os: r0/a1: 0x00000000 r1/a2: 0x00033050 r2/a3: 0x00000064
[00:51:03.297,210] <err> os: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x000280cb
[00:51:03.305,877] <err> os: xpsr: 0x61000000
[00:51:03.311,126] <err> os: Faulting instruction address (r15/pc): 0x00030804
[00:51:03.319,030] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
[00:51:03.326,965] <err> os: Current thread: 0x20003990 (unknown)

Finding the function that is related to the instruction address using the build/zephyr/zephyr.map is simple enough, but generally not useful (eg here its in strcpy()...). 

Is there a way to get the fault handler to dump the full stack for the active thread? This would be very useful in release code to generate a useful error report (where one can't just attach the debugger!)

(I can get a full core dump, but this is too large for easy recovery in a log, and doesn't give a stack unwind trace directly either)

I see other questions about this on the forum, but haven't seen an answer that provides the stack trace... 

Parents
  • This has been discussed a lot in this forum and few config options were given (for example) here and here

    Can you try something like below

    # Debugging configuration
    CONFIG_THREAD_NAME=y
    CONFIG_THREAD_ANALYZER=y
    CONFIG_THREAD_ANALYZER_AUTO=y
    CONFIG_THREAD_ANALYZER_RUN_UNLOCKED=y
    CONFIG_THREAD_ANALYZER_USE_PRINTK=y
    
    # Add asserts
    CONFIG_ASSERT=y
    CONFIG_ASSERT_VERBOSE=y
    CONFIG_ASSERT_NO_COND_INFO=n
    CONFIG_ASSERT_NO_MSG_INFO=n
    CONFIG_RESET_ON_FATAL_ERROR=n
    CONFIG_THREAD_NAME=y
    CONFIG_STACK_SENTINEL=y

  • Its been discussed a lot, but I don't see any answer that shows how to move from the basic log output to one with (at least) a full stack trace for the thread that caused the error.

    I  already used the thread analyzer setup (just was missing the thread name option)

    With the options you suggest, except for CONFIG_RESET_ON_FATAL_ERROR=n, a crash now gives

    [00:14:33.928,344] <inf> base: DV : request to render current page systest2.1
    * buffer overflow detected *
    [00:14:33.944,885] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000001
    [00:14:33.953,582] <err> os: r3/a4: 0x2000fa29 r12/ip: 0x0000000a r14/lr: 0x000208eb
    [00:14:33.962,280] <err> os: xpsr: 0x61000000
    [00:14:33.967,529] <err> os: Faulting instruction address (r15/pc): 0x000143a6
    [00:14:33.975,463] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    [00:14:33.983,123] <err> os: Current thread: 0x20003a68 (sysworkq)

    ie compared to my previous log, only the name of the current thread is shown as well. Still no stack trace/unwind as desired... any more things I can try?

    BTW, I have not yet used:

    CONFIG_RESET_ON_FATAL_ERROR=n 

    as this causes the function sys_reboot() to be not found by the linker (which I use to... reboot...). This seems odd - any ideas why?

  • Normally the hardfault handler unwinds the last stack frame for you and gives you detail of the instruction causing fault. How did you compile your project? Did you compile it adding the debug symbols? 

    I am always using Visual Studio code at my end these days so when prototyping I always choose "Optimize for debugging (-Og)" for the whole project. You can do this in your cmake files aswell.

    Also make sure you have CONFIG_DEBUG and CONFIG_DEBUG_INFO added to your prj.conf

  • I compile with CONFIG_DEBUG=y, CONFIG_DEBUG_INFO=y, and I see the -Og in the gcc options.

    Normally the hardfault handler unwinds the last stack frame for you and gives you detail of the instruction causing fault.

    Do you mean that you see a full call stack? or just the instruction pointer for the function with the fault (which is not helpful if its strlen or sprintf!)

    I'm looking for something like:

    fault @ strlen, called by sprintf, called by my_code_fn, called by my_other_fn, called by task_my_thread.

    Even if the function names are not printed but only the addresses....

    btw, addr2line gave me nothing:

    C:\work\dev\if-device-nrf53>arm-none-eabi-addr2line -e cc1-med/build/zephyr/zephyr.elf -a 0x3fda3
    0x0003fda3
    ??:?

     But a find in the cc1-med/build/zephyr/zephyr.map works to find the function for the address.... just not so scriptable...

  • BrianW said:
    fault @ strlen, called by sprintf, called by my_code_fn, called by my_other_fn, called by task_my_thread.

    No, the fault handler does not give the full call trace like gdb does. for getting a context function call trace, I start the debugger and set the breakpoint in the hardfault handler. If everything goes right and if the stack memory is not corrupted, then some of the IDE's debuggers show the call trace or setup GDB session if that is your preference. 

    BrianW said:
    C:\work\dev\if-device-nrf53>arm-none-eabi-addr2line -e cc1-med/build/zephyr/zephyr.elf -a 0x3fda3
    0x0003fda3
    ??:?

    Hmm, try to add "-f" option to attempt to get function names. 

  • No, the fault handler does not give the full call trace like gdb does. for getting a context function call trace, I start the debugger and set the breakpoint in the hardfault handler. If everything goes right and if the stack memory is not corrupted, then some of the IDE's debuggers show the call trace or setup GDB session if that is your preference. 

    Ok. 

    Does this means there is definitively no way to get a call trace in the firmware itself?

  • Seems that way. There is this config CONFIG_EXCEPTION_STACK_TRACE but I tried to test this with the blinky sample but it does not give any additional functional call trace. I see that this config have additional handling on RISC5 and X86 based fault handling code, so probably it is missing handling on ARM Cortex cores.

    additional prj.conf I included to test this

    CONFIG_DEBUG_INFO=y
    CONFIG_EXCEPTION_STACK_TRACE=y
    CONFIG_ASSERT_ON_ERRORS=y
    CONFIG_ASSERT=y
    CONFIG_LOG=y
    CONFIG_RESET_ON_FATAL_ERROR=n

Reply Children
No Data
Related