Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

stack guard module raises Hard Fault not MMU Fault

hello,

we are trying to use the stack guard module - obviously we want this to give well-behaved fault handling in the event that the stack gets too close for comfort.

What we found when testing is that an explicit write to the guard area raises a Memory Manager Fault, as expected. However when we trigger a real stack overflow (by writing a recursive function that does not exit), we found the processor locking up.

I spent a bit of time to extract a "minimal example" from our prototype code, and I found similar behaviour - in the example code below, you will find that calling the recursive function raises a Hard Fault.

Here is my minimal example:

// minimal implementation to show stack guard issue
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <string.h>
#include "nrf_mpu_lib.h"
#include "nrf_stack_guard.h"

static uint32_t count = 0;

void trigger_stack_guard(void)
{
  uint32_t * stack = (uint32_t *)0x2000E1FE;
  *stack = 0xDEADBEEF;
}

void evil_recursive(void)
{
    count++;
    evil_recursive();
}

void HardFault_Handler(void)
{
    while(true);
}

void MemoryManagement_Handler(void)
{
    while(true);
}

int main(void)
{
    nrf_mpu_lib_init();
    nrf_stack_guard_init();
    // enable Mem Fault Handler
    SCB->SHCSR |= SCB_SHCSR_MEMFAULTENA_Msk;

    // just write to the guard area
    trigger_stack_guard();

    // really cause a stack overflow
    evil_recursive();

    while(true);
}

If you run this as is and connect gdb, you will get to the MemoryManagement_Handler as expected. However if you comment out the trigger_stack_guard() call and run the recursive function, you get Hard Fault Handler.

Not sure why this is - we would like to know why it doesn't work as expected.

  • Hello,

    I did not manage to replicate this with the code you posted. Here is the stack trace I got using your code:

    And the register readout (SP is at the very beginning of my stack guard block):

    The only explanation I can think of is that the memory access violation must repeated again when you enter the MemoryManagement_Handler() as this would cause the CPU to raise a hardfault exception and preempt your memory fault handler (). You may place a breakpoint in MemoryManagement_Handler() to confirm this.

    I expect you would get the same issue with the trigger_stack_guard() call if you updated your stack pointer inside it:

    void trigger_stack_guard(void)
    {
      /* Simulate stack overflow by setting the SP to point to the guard block */
      __set_MSP(<stack guard start address >); // e.g.  __set_MSP(0x2000e400);
      static uint32_t * stack = (uint32_t *)0x2000E1FE;
      *stack = 0xDEADBEEF;
    }

    Best regards,

    Vidar

  • I did indeed put a breakpoint in Memory Management Handler, it doesn't get there. It goes straight to hard fault. But that could be something to do with the way the debugger is working I guess.

    If I obj dump the handler it is like this:

    00026302 <MemoryManagement_Handler>:
       26302:    b480          push    {r7}
       26304:    af00          add    r7, sp, #0
       26306:    e7fe          b.n    26306 <MemoryManagement_Handler+0x4>

    So it tries to push on the stack - which is likely what causes the hard fault.

    WHat this means is that really the stacj guard module doesn't actually do what it should, at least if you implement the handler in C. There are two ways to work around this:

    1. implement the handler in assembler.
    2. check the stack pointer in the hard fault handler, where you might want to turn off stack guard in order to be able to take corrective actions.
  • OK, I see the problem now. Unlike my FW built with Segger embedded studio, yours is pushing the frame pointer onto the stack when it enters the exception handler and therefore causing another write to the MPU protected stack region.  You can build your app with -fomit-frame-pointer to avoid this.

    danmcb said:
    WHat this means is that really the stacj guard module doesn't actually do what it should, at least if you implement the handler in C.

    Sorry, but I'm not sure I understand what this has to do with the stack guard module. We already have HardFault handling library which includes stack checking and parsing of MPU related faults.

  • the whole thing is about the stack guard module. In fact it is working as advertised, just that there are a few subtleties about making it work well when you have a real stack overflow. Didn't know about that gcc flag, thanks, we will try it. not sure where the hard fault library comes into it, I am not using it in this example, and the stack checking it does seems to be to do with which stack is in use when the fault occurs, which is not an issue here.

  • Fair enough. I saw this as more of a problem with how the fault handling was implemented. It seems to me like it would have been easier to just use the SDK provided hardfault handler, or is there a specific need that requires you to enable the MemManage handler?

    danmcb said:
    and the stack checking it does seems to be to do with which stack is in use when the fault occurs, which is not an issue here.

    That too. And in addition, if MSP is used, it will check if it is within the allocated stack section. If it isn't, then the SP will be reset back to its initial value.

Related