Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

nRF52840 RAM questions regarding stack and heap

Hello,

The Nordic folks have explained this many times, and even have a dedicated section in the soft-device spec. However, I can’t connect the dots for some reason. In the happy path, it seems straightforward, but as complexity grows in the application, so does the logic around RAM usage. I added a quick diagram below to give context to the questions. This is mainly about RAM, and the relative position of the stack and heap.

The way I understand RAM usage within nRF52 is that, typically, the RAM_BASE is defined, growing upwards by the RAM_LENGTH. This can be seen in the S140 Softdevice spec below:

https://infocenter.nordicsemi.com/pdf/S140_SDS_v1.1.pdf

For example:

In terms of definitions, the address region for RAM has defined in the linker files as seen below:

MEMORY
{
FLASH (rx) : ORIGIN = 0x27000, LENGTH = 0xd9000
RAM_SPIM3 (rwx) : ORIGIN = 0x20006000, LENGTH = 0x02000
RAM (rwx) : ORIGIN = 0x20008000, LENGTH = 0x38000
}


Based on this, the stack would then live at the RAM end address, meaning STACK_TOP would be defined as (RAM_BASE + RAM_LENGTH). Again, this is defined in the linker file as well:

.heap (COPY):
{
__HeapBase = .;
__end__ = .;
PROVIDE(end = .);
KEEP(*(.heap*))
__HeapLimit = .;
} > RAM

.stack_dummy (COPY):
{
KEEP(*(.stack*))
} > RAM

__StackTop = ORIGIN(RAM) + LENGTH(RAM);
__StackLimit = __StackTop - SIZEOF(.stack_dummy);




As for the complexity mentioned above, now throw an RTOS like FreeRTOS into the picture. Nordic has examples of this as well. But the point is that FreeRTOS has its own stack/heap designations as well. For example:

#define configTOTAL_HEAP_SIZE 131072 // In bytes
#define configTIMER_TASK_STACK_DEPTH 200 // In words (800 bytes)
#define configMINIMAL_STACK_SIZE 60 // In words (240 bytes)

Similarly, based on how FreeRTOS is configured, you may even have specific implementations of the heap provided by the OS, such as heap_[1-5].


With all of that said, I’ve tried to organize my thoughts into three main questions, all of which are kind of tied together in some way.

  1. I can imagine the scenario where someone sets the nRF52 stack size of 2K, either through __STACK_SIZE, __STARTUP_CONFIG, directly in the assembly file, or externally. But then sets the FreeRTOS stack size of 8K. I’m assuming these two would contradict each other. Another scenario would be if the dev didn’t set the stack size at all, which would mean the default would apply from the nRF52 startup assembly(gcc_startup_nrf52840.s), which might be less than what they have set in the FreeRTOS Config. So I guess the question is, how would you handle this scenario, or are the two stack/heap definitions mutually exclusive?

  2. In terms of the stack/heap allocations. Am I correct in saying that, as I add more data into the statically allocated memory sections, such as .bss, .data, and .noinit, then there’s a possibility that I will eventually run into the heap/stack sections? I say this because, based on the Softdevice spec referenced above, it looks like the stack/heap typically grow downward toward the base address, however as data gets added to the statically allocated sections like .bss, .data, and .noinit within the application, then it will grow upwards toward the stack/heap. I'm guessing that eventually they will collide? Or, maybe that is the exact reason for Nordics assertion below:

    /* Set stack top to end of RAM, and stack limit move down by
     * size of stack_dummy section */
    __StackTop = ORIGIN(RAM) + LENGTH(RAM);
    __StackLimit = __StackTop - SIZEOF(.stack_dummy);
    PROVIDE(__stack = __StackTop);
    
    /* Check if data + heap + stack exceeds RAM limit */
    ASSERT(__StackLimit >= __HeapLimit, "region RAM overflowed with stack")

  3. Lastly, and this is one I find very strange but, when looking at my memory map, I see that my .heap section has the same exact address of my .stack. This seems like a bug introduced by me, especially since both sections have a size of ~16K. I also see a .bss.heap_end.0 section, which seems odd. I was assuming all heap allocations reside in the .heap section, unless that’s an artifact of FreeRTOS? i.e looking back at question #1, maybe FreeRTOS reserves a section for its own stack and heap within .bss? So having both the nRF52 .stack/.heap definitions might be an issue on my end?

    .heap 0x20037718 0x4000
    0x20037718 __HeapBase = .
    0x20037718 __end__ = .
    0x20037718 PROVIDE (end = .)
    *(.heap*)
    .heap 0x20037718 0x4000 nrf52.a(gcc_startup_nrf52840.S.obj)
    0x2003b718 __HeapLimit = .
    
    .stack_dummy 0x20037718 0x4000
    *(.stack*)
    .stack 0x20037718 0x4000 rf52.a(gcc_startup_nrf52840.S.obj)
    0x20040000 __StackTop = (ORIGIN (RAM) + LENGTH (RAM))
    0x2003c000 __StackLimit = (__StackTop - SIZEOF (.stack_dummy))
    0x20040000 PROVIDE (__stack = __StackTop)
    0x00000001 ASSERT ((__StackLimit >= __HeapLimit), region RAM overflowed with stack)
    0x00000178 DataInitFlashUsed = (__bss_start__ - __data_start__)
    0x00084330 CodeFlashUsed = (__etext - ORIGIN (FLASH))
    0x000844a8 TotalFlashUsed = (CodeFlashUsed + DataInitFlashUsed)
    0x00000001 ASSERT ((TotalFlashUsed <= LENGTH (FLASH)), region FLASH overflowed with .data and user data)

I know, it's a lot. But thanks for taking the time to look!

  • Hello!

     

    The way I understand RAM usage within nRF52 is that, typically, the RAM_BASE is defined, growing upwards by the RAM_LENGTH. This can be seen in the S140 Softdevice spec below:

    You can setup your stacks/heaps in any way you want, but normally; you want the areas to grow against each other, as stack starts at the top and goes downwards, while .heap starts low and grows upwards.

    If you have any unused space between .heap and .stack, this will then be used if any of them exceeds their limits.

    A more proper way to set this up would be to use the MPU to set the boundaries (which NCS/Zephyr is capable of doing, and enables by default).

     

    gcc uses linker scripts, where you can freely place these just as you want, but the default behavior provided in the nRF5 sdk is that the HEAP is placed in a lower address as compared to the STACK.

    As for the complexity mentioned above, now throw an RTOS like FreeRTOS into the picture. Nordic has examples of this as well. But the point is that FreeRTOS has its own stack/heap designations as well. For example:

    FreeRTOS declares its heap in heap_x.c, as here in heap_1.c:

    static uint8_t ucHeap[ configTOTAL_HEAP_SIZE ];

    It is a straight up global RAM mapped area.

    You can add a "__attribute__((section(".heap_freertos")))" to place it in the .heap* region.

    I can imagine the scenario where someone sets the nRF52 stack size of 2K, either through __STACK_SIZE, __STARTUP_CONFIG, directly in the assembly file, or externally. But then sets the FreeRTOS stack size of 8K. I’m assuming these two would contradict each other. Another scenario would be if the dev didn’t set the stack size at all, which would mean the default would apply from the nRF52 startup assembly(gcc_startup_nrf52840.s), which might be less than what they have set in the FreeRTOS Config. So I guess the question is, how would you handle this scenario, or are the two stack/heap definitions mutually exclusive?

    The stack that is setup in the startup files is for MSP.

    An OS will typically, atleast for ARM architectures, let all ISRs run in MSP (main stack pointer) context, while tasks/threads run with PSP (process stack pointer) context.

    I am not a freeRTOS expert unfortunately, so I cannot state for certain how it behaves in this scenario; but I can state that they are two different memory areas.

     

    In terms of the stack/heap allocations. Am I correct in saying that, as I add more data into the statically allocated memory sections, such as .bss, .data, and .noinit, then there’s a possibility that I will eventually run into the heap/stack sections? I say this because, based on the Softdevice spec referenced above, it looks like the stack/heap typically grow downward toward the base address, however as data gets added to the statically allocated sections like .bss, .data, and .noinit within the application, then it will grow upwards toward the stack/heap. I'm guessing that eventually they will collide? Or, maybe that is the exact reason for Nordics assertion below:

    Evaluating stack depth is not always easy, because you have re-entrant functions, and also asynchronous interrupts that can occur at any point in time in your application.

    If one assumes only one stack, MSP, is used, then your stack depth highly depends on when an interrupt occurs, and its overall-stack depth.

    What we try to do is to provide you with these depths for the SoftDevice in certain scenarios, as shown here in S132 spec, table 2:

    https://infocenter.nordicsemi.com/topic/sds_s132/SDS/s1xx/mem_usage/mem_resource_reqs.html?cp=5_7_3_0_13_0_0

    This stack usage will then come in addition to your applications overall stack usage (MSP)

     

    Wrt. the memory regions running into each other, this is a real scenario, as I briefly touched upon in the start of my answer. Boundary checking if running into a bus/hard fault is crucial to finding such overflows.

     

    Lastly, and this is one I find very strange but, when looking at my memory map, I see that my .heap section has the same exact address of my .stack. This seems like a bug introduced by me, especially since both sections have a size of ~16K. I also see a .bss.heap_end.0 section, which seems odd. I was assuming all heap allocations reside in the .heap section, unless that’s an artifact of FreeRTOS? i.e looking back at question #1, maybe FreeRTOS reserves a section for its own stack and heap within .bss? So having both the nRF52 .stack/.heap definitions might be an issue on my end?

    This is a bit of linker magic to ensure that the .stack actually fits in the RAM without overlapping with .heap. 

     

    .bss.heap_end.0 seems to come from newlib nano:

    arm-none-eabi/lib/thumb/v7e-m+fp/hard/libnosys.a(sbrk.o)

    Looks like this one (or a variant of it; as I am not 100% which newlib-nano my gcc toolchain actually is compiled with):

    https://github.com/lupyuen/newlib/blob/master/libgloss/libnosys/sbrk.c#L11

     

    I wish you happy holidays!

    Cheers,

    Håkon

  • First off, thank you very much for the reply! It answered my questions but took me a minute to digest what you were saying.

    I think I am *almost all set now, but I have three lingering questions related to the stack and heap.

    1. I am seeing some strangeness in terms of the start and end address for the stack, and you may have alluded to it as part of your comment:

      “This is a bit of linker magic to ensure that the .stack actually fits in the RAM without overlapping with .heap.”

      On my end, I am viewing my .map file, and I see the following, with a specific focus on the end of .bss,and the start of .heap and .stack:



      The application runs *fine, but based on that memory map viewer above, it appears that .stack_dummy would run into .heap? Maybe I am interpreting that wrong, and the fact that there are two .stack_dummy sections might be related to the magic you mentioned above. This one stumps me a bit.


    2. In your opinion, would you recommend utilizing the MPU to catch things like stack overflows, or can something simple like a hand-rolled stack guard do the same thing with less overhead? Based on your comment, I think the best solution would be to utilize the MPU for this, as I see it’s available in SDK 17.1, but I just wanted to verify.

    3. This is probably tightly coupled with #1 but, based on your response above, maybe the memory map looks that way because the .heap and .stack_dummy sections are growing towards each other? In which case, having a small buffer in between is where you would put your stack guard or whatever strategy is best based on question #2 above?

    Thanks again for your time on this!

  • Hi,

     

    Birt said:
    I am seeing some strangeness in terms of the start and end address for the stack, and you may have alluded to it as part of your comment:

    “This is a bit of linker magic to ensure that the .stack actually fits in the RAM without overlapping with .heap.”

    On my end, I am viewing my .map file, and I see the following, with a specific focus on the end of .bss,and the start of .heap and .stack:

    To test the linker, take for instance the nRF5_SDK_17.1.0_ddde560/examples/ble_peripheral/ble_app_blinky/pca10056/s140/armgcc and go into the Makefile, and alter this line:

    nrf52840_xxaa: ASMFLAGS += -D__HEAP_SIZE=0x38000
    nrf52840_xxaa: ASMFLAGS += -D__STACK_SIZE=32768

     

    Which will overflow the .heap (into the .stack) with this error:

    Linking target: _build/nrf52840_xxaa.out
    /opt/zephyr-sdk/zephyr-sdk-0.16.3/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.2.0/../../../../arm-zephyr-eabi/bin/ld: region RAM overflowed with stack
    collect2: error: ld returned 1 exit status
    make: *** [../../../../../../components/toolchain/gcc/Makefile.common:294: _build/nrf52840_xxaa.out] Error 1
    

    And checking the .map file (grep for "heap" and "stack") shows that they do indeed overlap with each other:

     

    In your screenshot it is the __HeapLimit (ie. the top of .heap) that is nearing the start of .stack area. The start of .heap is signalled by symbol "__HeapBase".

    Birt said:
    In your opinion, would you recommend utilizing the MPU to catch things like stack overflows, or can something simple like a hand-rolled stack guard do the same thing with less overhead? Based on your comment, I think the best solution would be to utilize the MPU for this, as I see it’s available in SDK 17.1, but I just wanted to verify.

    Yes. MPU can be used for this purpose.

    You can also use gcc's inbuilt stack canary function "-fstack-protector", which will effectively wrap each function with a auto-variable and check it when exiting.

    Birt said:
    This is probably tightly coupled with #1 but, based on your response above, maybe the memory map looks that way because the .heap and .stack_dummy sections are growing towards each other? In which case, having a small buffer in between is where you would put your stack guard or whatever strategy is best based on question #2 above?

    .heap goes upwards in address, while stack goes downwards in address. There are still some bytes left unused between the sections.

     

    Kind regards,

    Håkon

Related