This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Stack overflow

Hello,

I have been running this program without any problem then I added some code to read and write to flash using NVS when the error shown below appeared.

I am running a single ( main() ) thread application in Segger Embedded Studio v5.34a.

*** Booting Zephyr OS build v2.4.99-ncs1-rc1 ***
[00:00:13.434,600] [1;31m<err> os: ***** USAGE FAULT *****
[00:00:13.434,600] [1;31m<err> os: Stack overflow (context area not valid)
[00:00:13.434,600] [1;31m<err> os: r0/a1: 0x00000000 r1/a2: 0x0000cbfd r2/a3: 0x00000000
[00:00:13.434,600] [1;31m<err> os: r3/a4: 0x0000cbfd r12/ip: 0x207bc4ae r14/lr: 0x77fadffe
[00:00:13.434,600] [1;31m<err> os: xpsr: 0xfdd9ca00
[00:00:13.434,600] [1;31m<err> os: Faulting instruction address (r15/pc): 0x05221210
[00:00:13.434,631] [1;31m<err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
[00:00:13.434,631] [1;31m<err> os: Current thread: 0x20000668 (unknown)
[00:00:13.696,472] [1;31m<err> fatal_error: Resetting system

If I the solution is to increase the stack size, how do I do it?

Can someone please help?

Kind regards

Mohamed

Parents
  • If the function that caused the stack overflow is called from main, increasing CONFIG_MAIN_STACK_SIZE would be the solution then.

    Best regards,

    Simon

  • Thank you Simon.

    What is the solution if the function that is causing the stack overflow is not called from main() but few levels down the function call tree?

    Kind regards

    Mohamed

  • Hi Simon,

    I increased the stack size to 8192 in prj.conf

    CONFIG_MAIN_STACK_SIZE=8192

    But I am still seeing this error upon entering main and before executing any of the application instructions.

    In the SES debugger, the yellow arrow points to the first curly bracket  '{' in main(). I then click Go (F5) to move the cursor to the first instruction in main() then the error appears.

    *** Booting Zephyr OS build v2.4.99-ncs1-rc1 ***
    [00:00:41.795,043] <err> os: ***** USAGE FAULT *****
    [00:00:41.795,043] <err> os: Stack overflow (context area not valid)
    [00:00:41.795,043] <err> os: r0/a1: 0x245bc4ae r1/a2: 0x77fb5ffe r2/a3: 0x05821210
    [00:00:41.795,074] <err> os: r3/a4: 0xbdd9cba6 r12/ip: 0x0405a049 r14/lr: 0x77faebb6
    [00:00:41.795,074] <err> os: xpsr: 0x9ed6fa00
    [00:00:41.795,074] <err> os: Faulting instruction address (r15/pc): 0x00aec010
    [00:00:41.795,074] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
    [00:00:41.795,074] <err> os: Current thread: 0x20000668 (unknown)
    [00:00:42.056,854] <err> fatal_error: Resetting system

    It looks like something is going before main() starts running.

    Please help.

    Kind regards

    Mohamed

  • Hi Simon,

    I have had a look at the thread-analyzer link and it looks like it could help debug the stack overflow I am seeing. However, I am not sure how to use it. Can you please provide an example C code using thread_analyzer_run() and thread_analyzer_print().

    Thank you.

    Kind regards

    Mohamed

  • Could you first try to do the following?

    • cd <application folder>/<build_folder>/zephyr
    • addr2line -e zephyr.elf 0xaec010

    I'm using addr2line that comes with MinGW. That command will provide you with the exact place that causes the fault. I got the address 0xaec010 from here: [00:00:41.795,074] <err> os: Faulting instruction address (r15/pc): 0x00aec010.

    Then start a debug session, and put a break point on the place that caused the fault, and you will find the corresponding thread at the bottom of the call stack, like in the call stack in https://devzone.nordicsemi.com/f/nordic-q-a/70100/ecdsa-signing-crashes-when-implemented-with-bluetooth/288468#288468.

    If the issue is due to a stack overflow, increasing the stack size of the thread you found should solve the problem.

    There may be some more efficient ways of resolving this, but this should work.

    Best regards,

    Simon

  • Thank you Simon for your help over the weekend. Your help is really appreciated.

    It has been few days since I last rebooted my laptop. So, I rebooted it yesterday and enabled the stack analyzer module in prj.cong then started to debug my application in SES and found that in fact the stack usage is not a problem at all. I have plenty of unused stack in the main thread. The application also runs without any problems. So, it looks like my laptop was causing the problem I was seeing in SES/zephyr/application. I am rather confused as to how a problem with my laptop could cause an embedded application running on a separate target to fail with a stack overflow error. Maybe you can enlighten me and tell how this can be possible.

    Thank you.

    Kind regards

    Mohamed

  • Hmm.. Maybe it was the fact that you reopened SES that changed the behaviour? Be aware that changes in the overlay or Kconfig will not be taken into effect before restarting SES (or re-running the CMake logic again in some way).

    Best regards,

    Simon

Reply Children
  • Thank you Simon.

    I think changes to any configuration settings (prj.conf, overlay, Kconfig...) necessitates to re-open nRF Connect SDK project via File... I did  not think SES had to be restarted.

    Anyway, the problem has disappeared for now. Let's hope it will not show its ugly head again.

    Is there anything in the map file that could give me pointers when the stack size is not big enough?

    Kind regards

    Mohamed

  • Learner said:
    Is there anything in the map file that could give me pointers when the stack size is not big enough?

    I'm not totally sure about this. You could search for the faulting address in the map file and figure out what's causing the fault stack overflow, and set a break point at that location to see what thread it's running from, and then increase the stack size of that thread.

    The Thread analyzer is probably the best way of analyzing the stack usage of the threads.

    Learner said:
    I think changes to any configuration settings (prj.conf, overlay, Kconfig...) necessitates to re-open nRF Connect SDK project via File... I did  not think SES had to be restarted.

    Yes, you are correct about this. But reopening SES follows that you run File.. again, and that might include new dts/overlay/Kconfig changes that wasn't present before you reopened SES. However, something completely different might have triggered the fault, and let's hope it doesn't show up again.

    Best regards,

    Simon

  • Thank you Simon.

    Yes, the Thread analyzer has proved to be very useful.

    Let's just hope that it will not occur again.

    Kind regards

    Mohamed

  • Hi Simon,

    The stack overflow problem is showing its ugly head again.

    The fault is occurring in the function k_work_handler_t pid4_tasks( void ) which is setup in main() as follows,

    static struct k_work main_work_pid4;

    void main( void )
    {
           k_work_init( &main_work_pid4, pid4_tasks );
           ...

           while (1)
           {
                ...

                k_work_submit( &main_work_pid4 );

                ...

           }

    }

    The main() stack is configured in the overlay file and so is the THREAD_ANALYZER stack,

    CONFIG_MAIN_STACK_SIZE=4096

    CONFIG_THREAD_ANALYZER_AUTO_STACK_SIZE=2048

    The stack analyzer output is shown below in bold. Although I am configuring the main to be 4096, the stack analyzer is showing stack sizes of 2048, 1024 320 and 4096, why is this?

    The stack analyzer is not showing stack usage greater than 4096. In fact, I doubled the size of the main stack but the fault is still occurring. So, it could the stack overflow is the consequence of this  error "Stacking error (context area might be not valid)".

    I am attaching a picture of the debugger status at the point the fault occurs.

    I did not want to increase 

    Below is a trace capture of the Debug Terminal in SES IDE. 

    Please help.

    Kind regards

    Mohamed

    *** Booting Zephyr OS build v2.4.99-ncs1-rc1 ***
    Thread analyze:
    0x20002780 : STACK: unused 1448 usage 600 / 2048 (29 %); CPU: 0 %
    0x20002fc8 : STACK: unused 1636 usage 412 / 2048 (20 %); CPU: 0 %
    0x20002db8 : STACK: unused 628 usage 396 / 1024 (38 %); CPU: 0 %
    0x20002860 : STACK: unused 348 usage 676 / 1024 (66 %); CPU: 0 %
    0x20002f08 : STACK: unused 288 usage 32 / 320 (10 %); CPU: 0 %
    0x20002e60 : STACK: unused 3508 usage 588 / 4096 (14 %); CPU: 99 %
    Thread analyze:
    0x20002780 : STACK: unused 1208 usage 840 / 2048 (41 %); CPU: 0 %
    0x20002fc8 : STACK: unused 1636 usage 412 / 2048 (20 %); CPU: 0 %
    0x20002db8 : STACK: unused 628 usage 396 / 1024 (38 %); CPU: 0 %
    0x20002860 : STACK: unused 348 usage 676 / 1024 (66 %); CPU: 0 %
    0x20002f08 : STACK: unused 36 usage 284 / 320 (88 %); CPU: 19 %
    0x20002e60 : STACK: unused 3508 usage 588 / 4096 (14 %); CPU: 80 %
    LR1110 Driver Version: v2.0.1
    LR1110 Firmware Version: HW: 22 Type: 01, FW: 03.03
    System Errors = 0x20
    System Errors = 0x0
    Counter = 1
    LR1110 Modem Packet Type = 1
    Counter = 2

    Thread analyze:
    0x20002780 : STACK: unused 1208 usage 840 / 2048 (41 %); CPU: 0 %
    0x20002fc8 : STACK: unused 1636 usage 412 / 2048 (20 %MCU TEMP = -588.24 0.000660 21.95 °C
    ); CPU: 0 %
    --- 8 messages dropped ---
    delta_time_ms = 0
    Thread analyze:
    0x20002780 : STACK: unused 1208 usage 840 / 2048 (41 %); CPU: 0 %
    0x20002fc8 : STACK: unused 1036 usage 1012 / 2048 (49 %); CPU: 42 %
    0x20002db8 : STACK: unused 628 usage 396 / 1024 (38 %); CPU: 0 %
    0x20002860 : STACK: unused 300 usage 724 / 1024 (70 %); CPU: 0 %
    0x20002f08 : STACK: unused 36 usage 284 / 320 (88 %); CPU: 16 %
    0x20002e60 : STACK: unused 2988 usage 1108 / 4096 (27 %); CPU: 41 %
    MCU TEMP = -588.24 0.000660 21.95 °C
    [00:01:03.042,20[00:01:03.042,20delta_time_ms = 0


    [00:01:23.802,246] <err> os: ***** MPU FAULT *****
    [00:01:23.802,276] <err> os: Stacking error (context area might be not valid)
    [00:01:23.802,276] <err> os: Data Access Violation
    [00:01:23.802,307] <err> os: MMFAR Address: 0x200056dc
    [00:01:23.802,307] <err> os: r0/a1: 0x00000000 r1/a2: 0xaaaaaaaa r2/a3: 0xe288c458
    [00:01:23.802,337] <err> os: r3/a4: 0xdafdd530 r12/ip: 0x568e7d0e r14/lr: 0x3c74b9ef
    [00:01:23.802,337] <err> os: xpsr: 0xf48f4a00
    [00:01:23.802,368] <err> os: Faulting instruction address (r15/pc): 0x6052b7d0
    [00:01:23.802,368] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
    [00:01:23.802,398] <err> os: Current thread: 0x20002860 (unknown)
    [00:01:24.103,271] <err> fatal_error: Resetting system


    *** Booting Zephyr OS build v2.4.99-ncs1-rc1 ***
    Thread analyze:
    0x20002780 : STACK: unused 1448 usage 600 / 2048 (29 %); CPU: 0 %
    0x20002fc8 : STACK: unused 1636 usage 412 / 2048 (20 %); CPU: 1 %
    0x20002db8 : STACK: unused 628 usage 396 / 1024 (38 %); CPU: 1 %
    0x20002860 : STACK: unused 348 usage 676 / 1024 (66 %); CPU: 18 %
    0x20002f08 : STACK: unused 288 usage 32 / 320 (10 %); CPU: 0 %
    0x20002e60 : STACK: unused 3508 usage 588 / 4096 (14 %); CPU: 78 %
    Thread analyze:
    0x20002780 : STACK: unused 1208 usage 840 / 2048 (41 %); CPU: 0 %
    0x20002fc8 : STACK: unused 1636 usage 412 / 2048 (20 %); CPU: 0 %
    0x20002db8 : STACK: unused 628 usage 396 / 1024 (38 %); CPU: 0 %
    0x20002860 : STACK: unused 348 usage 676 / 1024 (66 %); CPU: 0 %
    0x20002f08 : STACK: unused 36 usage 284 / 320 (88 %); CPU: 99 %
    0x20002e60 : STACK: unused 3508 usage 588 / 4096 (14 %); CPU: 0 %
    LR1110 Driver Version: v2.0.1
    LR1110 Firmware Version: HW: 22 Type: 01, FW: 03.03
    System Errors = 0x20
    System Errors = 0x0
    Counter = 1


    LR1110 Modem Packet Type = 1
    Counter = 2


    Thread analyze:
    0x20002780 : STACK: unused 1208 usage 840 / 2048 (41 %); CPU: 0 %
    0x20002fc8 : STACK: unused 1636 usage

  • Hi Mohamed

    It may be helpful to set CONFIG_THREAD_NAME=y, then you will see the names of the threds instead of just a hex number.

    It seems like the thread 0x20002860 caused the MPU fault ("Current thread: 0x20002860 (unknown)"). I think an MPU fault may be a consequence of a stack overflow, and I can see that the size of that thread is 1024 bytes ("0x20002860 : STACK: unused 348 usage 676 / 1024..."). It seems like this is the main thread (based on your findings), so I'm not sure why the size is only 1024 when you set CONFIG_MAIN_STACK_SIZE=4096. You could try to take a look at the file <sample>/build/zephyr/.config and check what CONFIG_MAIN_STACK_SIZE is equal to. You could also set CONFIG_THREAD_NAME=y, and figure out exactly which thread is causing the issue.

    If this didn't help, could you upload the sample in zipped format and I'll take a look at it.

    Best regards,

    Simon

Related