Logging thread stack overflow

Question

Hello,

I am having an issue with Zephyr, it constantly reboots the system because of stack overflow.

00> [00:05:36.447,265] <err> os: r0/a1:  0x00000002  r1/a2:  0x20004d60  r2/a3:  0xf0f0f0f0
00> [00:05:36.447,296] <err> os: r3/a4:  0x2000e700 r12/ip:  0x2000a5e4 r14/lr:  0x00058045[0m
00> [00:05:36.447,326] <err> os:  xpsr:  0x41070000
00> [00:05:36.447,326] <err> os: r4/v1:  0x00000000  r5/v2:  0xffffffff  r6/v3:  0xffffffff
00> [00:05:36.447,357] <err> os: r7/v4:  0x00000000  r8/v5:  0x00000001  r9/v6:  0x00000000
00> [00:05:36.447,357] <err> os: r10/v7: 0x00000001  r11/v8: 0x20001a00    psp:  0x20026d28
00> [00:05:36.447,357] <err> os: EXC_RETURN: 0x0
00> [00:05:36.447,387] <err> os: Faulting instruction address (r15/pc): 0x00046704
00> [00:05:36.447,418] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
00> [00:05:36.447,418] <err> os: Current thread: 0x20004d60 (logging)
00> [00:05:36.780,670] <err> fatal_error: Resetting system

....

[03:04:16.903,503] <err> os: r0/a1:  0x00000002  r1/a2:  0xe000ed04  r2/a3:  0xf0f0f0f0
00> [03:04:16.903,533] <err> os: r3/a4:  0x2000e700 r12/ip:  0x00000000 r14/lr:  0x0003235d
00> [03:04:16.903,533] <err> os:  xpsr:  0x410f0021
00> [03:04:16.903,564] <err> os: r4/v1:  0x20003890  r5/v2:  0x0005a3a3  r6/v3:  0x00000007
00> [03:04:16.903,564] <err> os: r7/v4:  0x000000c6  r8/v5:  0x20003888  r9/v6:  0x00030f79
00> [03:04:16.903,594] <err> os: r10/v7: 0x00058b54  r11/v8: 0x00000000    psp:  0x20026c10
00> [03:04:16.903,594] <err> os: EXC_RETURN: 0x0
00> [03:04:16.903,594] <err> os: Faulting instruction address (r15/pc): 0x00046704
00> [03:04:16.903,625] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
00> [03:04:16.903,656] <err> os: Fault during interrupt handling
00> 
00> [03:04:16.903,656] <err> os: Current thread: 0x20004d60 (logging)
00> [03:04:17.257,568] <err> fatal_error: Resetting system[0m

Sometimes in these faults, it says "Fault during interrupt handling".

The code line for the faulting address is:

arm-none-eabi-addr2line.exe -e build\zephyr\zephyr.elf 0x00046704
ncs/v2.1.0/zephyr/kernel/thread.c:380

It is actually in reference to the SENTINEL mechanism:

I am using the following CONFIG options for logging, and even with absurd values of stack it still fails:

# -- Lora
# -- BLE
# -- MCUBoot
...

# -- Logs

CONFIG_UART_CONSOLE=n
CONFIG_USE_SEGGER_RTT=y
CONFIG_RTT_CONSOLE=y
CONFIG_LOG_BACKEND_RTT=y

CONFIG_LOG=y
CONFIG_LOG_PROCESS_THREAD=y
CONFIG_LOG_DEFAULT_LEVEL=4
CONFIG_LOG_BUFFER_SIZE=8192

CONFIG_PRINTK=n
CONFIG_LOG_PRINTK=n
CONFIG_NEWLIB_LIBC=y
CONFIG_CBPRINTF_FULL_INTEGRAL=y
CONFIG_CBPRINTF_FP_SUPPORT=y
CONFIG_NEWLIB_LIBC_FLOAT_SCANF=y
CONFIG_NEWLIB_LIBC_FLOAT_PRINTF=y
CONFIG_NEWLIB_LIBC_NANO=n

# -- Optimizations
CONFIG_NO_OPTIMIZATIONS=n

# -- Sentinel and stacks
CONFIG_STACK_SENTINEL=y

CONFIG_THREAD_STACK_INFO=y
CONFIG_THREAD_MONITOR=y
CONFIG_INIT_STACKS=y
CONFIG_THREAD_NAME=y
CONFIG_EXTRA_EXCEPTION_INFO=y
CONFIG_GPIO_LOG_LEVEL_DBG=y
CONFIG_LORA_LOG_LEVEL_DBG=y

CONFIG_BT_RX_STACK_SIZE=4096
CONFIG_IDLE_STACK_SIZE=4096
CONFIG_PRIVILEGED_STACK_SIZE=4096
CONFIG_BT_CTLR_RX_PRIO_STACK_SIZE=1024
CONFIG_BT_HCI_TX_STACK_SIZE=4096
CONFIG_LOG_PROCESS_THREAD_STACK_SIZE=100000
CONFIG_LOG_BUFFER_SIZE=8192
CONFIG_MAIN_STACK_SIZE=4096
CONFIG_ISR_STACK_SIZE=32768
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096

# -- Heap
CONFIG_HEAP_MEM_POOL_SIZE=15240

Is the problem really the logging thread? Any suggestions?

thiago_hd · Accepted Answer

Hello @Simon, actually I got to solve that problem by trying and changing some stack sizes within tasks that are intrinsic to the system, and also my tasks' stack sizes. 
 Apparently, it is really necessary during the investigation to compile the code with CONFIG_NO_OPTIMIZATIONS=y so that the system can tell you correctly which thread and which line were executed right before going into fault. 
 I have miniaturised my application by removing some tasks, then, testing and enabling each task at a time, and retesting, and so on. 
 Also, apparently, you cannot enable STACK_SENTINEL in the same time as you have MPU_STACK_GUARD enabled as well (I have seen this kind of argument on github, though I cannot recall where I have seen it on github). Since the default config for nRF52840 is that it uses hardware stack guard (MPU_STACK_GUARD), I have not used STACK_SENTINEL, neither STACK_CANARIES. 
 I also "disabled" the LOGGING Thread by setting CONFIG_LOG_MODE_IMMEDIATE=y. I find this a better use of logging. This forces the system to do the logging immediately, inside the own task which called the logging function, though, as a side effect causing the need of more stack size for each task which uses logging. 
 The main configuration changes regarding FAULTS, HEAP size, STACK sizes, Thread Analyzer I have used for solving my problem is below: 
 # ----------------------------------------------------
# HEAP Mem, Stack, Falhas, Thread Analyzer
# ----------------------------------------------------
CONFIG_HEAP_MEM_POOL_SIZE=32768

CONFIG_REBOOT=y
# CONFIG_RESET_ON_FATAL_ERROR=y
CONFIG_FATAL_ERROR_LOG_LEVEL_DBG=y
CONFIG_BT_HCI_VS_FATAL_ERROR=y


CONFIG_EXTRA_EXCEPTION_INFO=y

CONFIG_THREAD_NAME=y
CONFIG_THREAD_ANALYZER=y
CONFIG_THREAD_STACK_INFO=y
CONFIG_THREAD_MONITOR=y
CONFIG_THREAD_ANALYZER_ISR_STACK_USAGE=y
CONFIG_THREAD_ANALYZER_AUTO_INTERVAL=5
# CONFIG_THREAD_ANALYZER_USE_PRINTK=n
CONFIG_THREAD_ANALYZER_AUTO=y
CONFIG_THREAD_ANALYZER_USE_LOG=y
CONFIG_THREAD_ANALYZER_AUTO_STACK_SIZE=65536

CONFIG_INIT_STACKS=y
CONFIG_STACK_USAGE=y
# Aparentemente stack sentinel e mpu stack guard nao podem ser usados simultaneamente (fonte: github, investigar mais a fundo)
#CONFIG_MPU_STACK_GUARD=y 
#CONFIG_STACK_SENTINEL=y


# --- Tamanhos Stacks ---
CONFIG_IDLE_STACK_SIZE=4096
CONFIG_MAIN_STACK_SIZE=8196
CONFIG_ISR_STACK_SIZE=8196
CONFIG_MPSL_WORK_STACK_SIZE=8196
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=8196


CONFIG_BT_RX_STACK_SIZE=4096
CONFIG_BT_HCI_TX_STACK_SIZE=4096

CONFIG_PRIVILEGED_STACK_SIZE=8196

Logging thread stack overflow

Top Replies