How to generate full thread stack trace to log on crash

Using Zephyr (cf2149caf2) / nrfSDKConnect on a nrf5340DK board.

I need to know how to get a direct stack trace (with file/line numbers if possible for a debug build).

Currently I only get this type of log output:

[00:51:03.271,972] <err> os: ***** BUS FAULT *****
[00:51:03.277,435] <err> os: Precise data bus error
[00:51:03.283,172] <err> os: BFAR Address: 0x0
[00:51:03.288,482] <err> os: r0/a1: 0x00000000 r1/a2: 0x00033050 r2/a3: 0x00000064
[00:51:03.297,210] <err> os: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x000280cb
[00:51:03.305,877] <err> os: xpsr: 0x61000000
[00:51:03.311,126] <err> os: Faulting instruction address (r15/pc): 0x00030804
[00:51:03.319,030] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
[00:51:03.326,965] <err> os: Current thread: 0x20003990 (unknown)

Finding the function that is related to the instruction address using the build/zephyr/zephyr.map is simple enough, but generally not useful (eg here its in strcpy()...).

Is there a way to get the fault handler to dump the full stack for the active thread? This would be very useful in release code to generate a useful error report (where one can't just attach the debugger!)

(I can get a full core dump, but this is too large for easy recovery in a log, and doesn't give a stack unwind trace directly either)

I see other questions about this on the forum, but haven't seen an answer that provides the stack trace...

Top Replies

Susheel Nuguru 10 months ago in reply to BrianW +1 verified

Seems that way. There is this config CONFIG_EXCEPTION_STACK_TRACE but I tried to test this with the blinky sample but it does not give any additional functional call trace. I see that this config have…

Parents

0 Susheel Nuguru 10 months ago

This has been discussed a lot in this forum and few config options were given (for example) here and here

Can you try something like below

# Debugging configuration
CONFIG_THREAD_NAME=y
CONFIG_THREAD_ANALYZER=y
CONFIG_THREAD_ANALYZER_AUTO=y
CONFIG_THREAD_ANALYZER_RUN_UNLOCKED=y
CONFIG_THREAD_ANALYZER_USE_PRINTK=y

# Add asserts
CONFIG_ASSERT=y
CONFIG_ASSERT_VERBOSE=y
CONFIG_ASSERT_NO_COND_INFO=n
CONFIG_ASSERT_NO_MSG_INFO=n
CONFIG_RESET_ON_FATAL_ERROR=n
CONFIG_THREAD_NAME=y
CONFIG_STACK_SENTINEL=y

0 BrianW 10 months ago in reply to Susheel Nuguru

Its been discussed a lot, but I don't see any answer that shows how to move from the basic log output to one with (at least) a full stack trace for the thread that caused the error.

I already used the thread analyzer setup (just was missing the thread name option)

With the options you suggest, except for CONFIG_RESET_ON_FATAL_ERROR=n, a crash now gives

[00:14:33.928,344] <inf> base: DV : request to render current page systest2.1
* buffer overflow detected *
[00:14:33.944,885] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000001
[00:14:33.953,582] <err> os: r3/a4: 0x2000fa29 r12/ip: 0x0000000a r14/lr: 0x000208eb
[00:14:33.962,280] <err> os: xpsr: 0x61000000
[00:14:33.967,529] <err> os: Faulting instruction address (r15/pc): 0x000143a6
[00:14:33.975,463] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:14:33.983,123] <err> os: Current thread: 0x20003a68 (sysworkq)

ie compared to my previous log, only the name of the current thread is shown as well. Still no stack trace/unwind as desired... any more things I can try?

BTW, I have not yet used:

CONFIG_RESET_ON_FATAL_ERROR=n

as this causes the function sys_reboot() to be not found by the linker (which I use to... reboot...). This seems odd - any ideas why?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Susheel Nuguru 10 months ago in reply to BrianW
Easiest is to open a command prompt in your project folder an ddo

path_to_gnuarmemb/bin/arm-none-eabi-addr2line -e build/zephyr/zephyr.elf -a 0x000143a6

You would then get the exact context of your fault.

BrianW said:
CONFIG_RESET_ON_FATAL_ERROR=n

as this causes the function sys_reboot() to be not found by the linker (which I use to... reboot...). This seems odd - any ideas why?

That seems odd, yes.
The main use of this config is where to pull the fatal_error.c file in or not as seen inside the file nrf\lib\fatal_error\CMakeLists.txt. I did not think that it had any other dependencies or side effects of setting it to "n"
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 BrianW 10 months ago in reply to Susheel Nuguru

Susheel Nuguru said:
That seems odd, yes.
The main use of this config is where to pull the fatal_error.c file in or not as seen inside the file nrf\lib\fatal_error\CMakeLists.txt. I did not think that it had any other dependencies or side effects of setting it to "n"

if you have

CONFIG_RESET_ON_FATAL_ERROR=n

then you also need to explicitly define

CONFIG_REBOOT=y

The fatal error then indeed halts the system, rather than rebooting it.

However, this still does NOT produce a full stack trace for the offending thread.

Is there a way do do this stack unwind for a log? Otherwise knowing it failed in strcpy or another lib function is not very helpful to know how it got there... (which is all the addr2line untility gives you!)

Is this really impossible to do on the device?? gdb knows how to do it (but of course you have to have a gbd server connected to be able to get that!)
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Susheel Nuguru 10 months ago in reply to BrianW

Normally the hardfault handler unwinds the last stack frame for you and gives you detail of the instruction causing fault. How did you compile your project? Did you compile it adding the debug symbols?

I am always using Visual Studio code at my end these days so when prototyping I always choose "Optimize for debugging (-Og)" for the whole project. You can do this in your cmake files aswell.

Also make sure you have CONFIG_DEBUG and CONFIG_DEBUG_INFO added to your prj.conf
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 BrianW 10 months ago in reply to Susheel Nuguru

I compile with CONFIG_DEBUG=y, CONFIG_DEBUG_INFO=y, and I see the -Og in the gcc options.

Susheel Nuguru said:
Normally the hardfault handler unwinds the last stack frame for you and gives you detail of the instruction causing fault.

Do you mean that you see a full call stack? or just the instruction pointer for the function with the fault (which is not helpful if its strlen or sprintf!)

I'm looking for something like:

fault @ strlen, called by sprintf, called by my_code_fn, called by my_other_fn, called by task_my_thread.

Even if the function names are not printed but only the addresses....

btw, addr2line gave me nothing:

C:\work\dev\if-device-nrf53>arm-none-eabi-addr2line -e cc1-med/build/zephyr/zephyr.elf -a 0x3fda3
0x0003fda3
??:?

But a find in the cc1-med/build/zephyr/zephyr.map works to find the function for the address.... just not so scriptable...
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Susheel Nuguru 10 months ago in reply to BrianW

BrianW said:
fault @ strlen, called by sprintf, called by my_code_fn, called by my_other_fn, called by task_my_thread.

No, the fault handler does not give the full call trace like gdb does. for getting a context function call trace, I start the debugger and set the breakpoint in the hardfault handler. If everything goes right and if the stack memory is not corrupted, then some of the IDE's debuggers show the call trace or setup GDB session if that is your preference.

BrianW said:
C:\work\dev\if-device-nrf53>arm-none-eabi-addr2line -e cc1-med/build/zephyr/zephyr.elf -a 0x3fda3
0x0003fda3
??:?

Hmm, try to add "-f" option to attempt to get function names.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Susheel Nuguru 10 months ago in reply to BrianW

BrianW said:
fault @ strlen, called by sprintf, called by my_code_fn, called by my_other_fn, called by task_my_thread.

No, the fault handler does not give the full call trace like gdb does. for getting a context function call trace, I start the debugger and set the breakpoint in the hardfault handler. If everything goes right and if the stack memory is not corrupted, then some of the IDE's debuggers show the call trace or setup GDB session if that is your preference.

BrianW said:
C:\work\dev\if-device-nrf53>arm-none-eabi-addr2line -e cc1-med/build/zephyr/zephyr.elf -a 0x3fda3
0x0003fda3
??:?

Hmm, try to add "-f" option to attempt to get function names.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 BrianW 10 months ago in reply to Susheel Nuguru

Susheel Nuguru said:
No, the fault handler does not give the full call trace like gdb does. for getting a context function call trace, I start the debugger and set the breakpoint in the hardfault handler. If everything goes right and if the stack memory is not corrupted, then some of the IDE's debuggers show the call trace or setup GDB session if that is your preference.

Ok.

Does this means there is definitively no way to get a call trace in the firmware itself?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
+1 Susheel Nuguru 10 months ago in reply to BrianW
Seems that way. There is this config CONFIG_EXCEPTION_STACK_TRACE but I tried to test this with the blinky sample but it does not give any additional functional call trace. I see that this config have additional handling on RISC5 and X86 based fault handling code, so probably it is missing handling on ARM Cortex cores.

additional prj.conf I included to test this

CONFIG_DEBUG_INFO=y CONFIG_EXCEPTION_STACK_TRACE=y CONFIG_ASSERT_ON_ERRORS=y CONFIG_ASSERT=y CONFIG_LOG=y CONFIG_RESET_ON_FATAL_ERROR=n
Cancel
Vote Up +1 Vote Down

Sign in to reply

Reject Answer

Cancel