Enabling CONFIG_FPU causes stack overflow

Problem: stack overflow in BT threads when CONFIG_FPU is enabled.

[00:00:00.034,881] <err> os: esf_dump: r0/a1: 0x00000002 r1/a2: 0x200038dc r2/a3: 0xf0f0f0f0
[00:00:00.034,912] <err> os: esf_dump: r3/a4: 0x20004e00 r12/ip: 0x00000014 r14/lr: 0x00027aab
[00:00:00.034,942] <err> os: esf_dump: xpsr: 0x41000000
[00:00:00.034,942] <err> os: esf_dump: s[ 0]: 0x00000000 s[ 1]: 0x00000000 s[ 2]: 0x00000000 s[ 3]: 0x00000000
[00:00:00.034,973] <err> os: esf_dump: s[ 4]: 0x00000000 s[ 5]: 0x00000000 s[ 6]: 0x00000000 s[ 7]: 0x00000000
[00:00:00.034,973] <err> os: esf_dump: s[ 8]: 0x00000000 s[ 9]: 0x00000000 s[10]: 0x00000000 s[11]: 0x00000000
[00:00:00.035,003] <err> os: esf_dump: s[12]: 0x00000000 s[13]: 0x00000000 s[14]: 0x00000000 s[15]: 0x00000000
[00:00:00.035,003] <err> os: esf_dump: fpscr: 0x00000000
[00:00:00.035,034] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x0001f608
[00:00:00.035,064] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
[00:00:00.035,125] <err> os: z_fatal_error: Current thread: 0x20001c68 (BT_LW_WQ)
*** Booting Zephyr OS build v3.2.99-ncs2-rc1 ***l_error_handler: Resetting system

I'm using the basic peripheral_hr sample from the sdk and enabling CONFIG_FPU=y in prj.conf. After that board will crash several times with stack overflow in some BT thread until it manages to come up. Connecting to the peripheral then causes further crashes.

I'm using a PCA10056 devkit and vscode nrf connect extension, SDK and toolchain version 2.3.0-rc1.

My understanding that it should be possible to use FPU on nrf52840, why does this failure happen?

To reproduce the failure, create a new freestanding app using peripheral_hr and add the following to prj.conf:

CONFIG_FPU=y
CONFIG_STACK_SENTINEL=y
CONFIG_THREAD_NAME=y

  • The problem came down to having CONFIG_STACK_SENTINEL=y. Apparently it is not compatible with CONFIG_FPU. More info in

    https://github.com/zephyrproject-rtos/zephyr/issues/32261

    I wish KConfig prevented one from enabling CONFIG_STACK_SENTINEL and CONFIG_FPU together.

    My reason for enabling CONFIG_STACK_SENTINEL was because I thought MPU (hardware stack protection) was not working. But I now realize that CONFIG_UART_INTERRUPT_DRIVEN=y prevented printing of crash messages.

    I only needed UART1 to be interrupt-driven (and UART0 is the debug console and should be async/dma based). That can be done, seeCan UART ASYNC and UART INTERRUPT work together?  and with that I can see MPU crashes on the console and do not need CONFIG_STACK_SENTINEL.

  • Hi,

    Thank you for the detailed report and quick way to reproduce, as well as update with your further findings and links to related Zephyr issues on Github.

    olegr said:
    I wish KConfig prevented one from enabling CONFIG_STACK_SENTINEL and CONFIG_FPU together.

    From what I understand from the github issue, as well as from the possibly related https://github.com/zephyrproject-rtos/zephyr/issues/37608, CONFIG_STACK_SENTINEL and use of MPU is the main conflict, not the use of FPU? It was suggested in the discussion there that the FPU error may be caused by increased stack usage. Has this been ruled out?

    Regards,
    Terje

  • It was suggested in the discussion there that the FPU error may be caused by increased stack usage.

    That seems reasonable. In my later debugging I've noticed that at least one BT thread (BT ECDH) appears to be using FPU when it is available, so it would need more stack on context switches?

    I did check that without CONFIG_FPU=y, the crash does not happen.

  • Hi Oleg, 

    You may want to run thread analyzer after you enable CONFIG_FPU=y to see how much stack the application needed. 


    You can find how we do that in our prj_minimal.conf in peripheral_hr: 


    # In order to correctly tune the stack sizes for the threads the following
    # Configurations can enabled to print the current use:
    # CONFIG_THREAD_NAME=y
    # CONFIG_THREAD_ANALYZER=y
    # CONFIG_THREAD_ANALYZER_AUTO=y
    # CONFIG_THREAD_ANALYZER_RUN_UNLOCKED=y
    # CONFIG_THREAD_ANALYZER_USE_PRINTK=y
    # CONFIG_THREAD_ANALYZER_AUTO_INTERVAL=20
    # CONFIG_CONSOLE=y
    # CONFIG_UART_CONSOLE=y
    # CONFIG_SERIAL=y
    # CONFIG_PRINTK=y
    
    # Example output of thread analyzer
    # BT RX               : STACK: unused 576 usage 448 / 1024 (43 %); CPU: 0 %
    # BT RX pri           : STACK: unused 260 usage 188 / 448 (41 %); CPU: 0 %
    # BT ECC              : STACK: unused 256 usage 888 / 1144 (77 %); CPU: 1 %
    # BT TX               : STACK: unused 296 usage 344 / 640 (53 %); CPU: 0 %
    # thread_analyzer     : STACK: unused 128 usage 384 / 512 (75 %); CPU: 1 %
    # sysworkq            : STACK: unused 856 usage 168 / 1024 (16 %); CPU: 0 %
    # logging             : STACK: unused 232 usage 536 / 768 (69 %); CPU: 0 %
    # idle 00             : STACK: unused 208 usage 48 / 256 (18 %); CPU: 97 %
    # main                : STACK: unused 576 usage 448 / 1024 (43 %); CPU: 0 %
    CONFIG_BT_RX_STACK_SIZE=1024
    CONFIG_BT_CTLR_RX_PRIO_STACK_SIZE=448
    CONFIG_BT_HCI_TX_STACK_SIZE_WITH_PROMPT=y
    CONFIG_BT_HCI_TX_STACK_SIZE=640
    CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=512
    CONFIG_IDLE_STACK_SIZE=128
    CONFIG_MAIN_STACK_SIZE=640
    CONFIG_ISR_STACK_SIZE=1024

  • Thanks that was useful to see which threads are close to the limit. However running with thread analyzer results in sporadic crashes in the "thread_analyzer" thread with errors like

    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/lib/os/ring_buffer.c:73

Related