This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

stack overflow and STACK GUARD

       I am using nrf52832 SDK14.2 development, now the system will generate some strange exceptions, I am worried about the stack overflow, is there any way to confirm whether the stack overflows, where does the stack overflow?
      In addition, I saw the STACK GUARD library. What is the role of this library? I only saw the initialization function, did not see the exception reminder and other processing functions, it can help me confirm whether the stack overflows and overflows?

Parents
  • The stack guard library uses a feature of the Cortex-M4 processor called the Memory Protection Unit (MPU). You can use the MPU to set attributes and access rights on regions of memory. The stack guard library creates a 256-byte region at the bottom of the stack with access permissions set to read-only (for both privileged and unprivileged accesses). This means that any instruction that tries to write past the bottom of the stack will trigger a memory management fault (as opposed to just silently corrupting the stack memory).

    Be aware that this only applies to the application stack, and I think this will only work if you use the "monolithic" SDK examples. If for example you use the FreeRTOS examples, then each thread will have its own stack.

    if you don't specifically enable memory manager faults in the System Handler Control and Status Register (SHCSR), then you'll just get a hard fault instead. This means that you should include both the stack guard library and the hard fault handler so that you will get an informative message on the serial port to let you know what happened.

    Note that you can use this same technique to also trap NULL pointer bugs. You just need to create a protected region at address 0x0. Since address 0x0 corresponds to the start of flash, reading from there is allowed and writing just fails silently, which means a NULL pointer bug could go unnoticed for some time. With the MPU it will show up immediately.

    If it were me, instead of using RO/RO (0x7) as the protection attributes, I would use NA/NA (0x0). This will trap both reads and writes. Reading past the end of the stack could also be a sign of a bug that is best fixed. I have found that you can get away with doing this even when setting a protection region at address 0x0. When I first tried this I was concerned it might have some impact on exception handling (since the exception vector table is also at address 0x0), but while reads and writes are successfully trapped, exception handling seems unaffected.

    You can read more about the MPU in this document:

    http://infocenter.arm.com/help/topic/com.arm.doc.dui0553b/DUI0553.pdf

    -Bill

  • wpaul wrote:

    If it were me, instead of using RO/RO (0x7) as the protection attributes, I would use NA/NA (0x0). This will trap both reads and writes. Reading past the end of the stack could also be a sign of a bug that is best fixed. I have found that you can get away with doing this even when setting a protection region at address 0x0. When I first tried this I was concerned it might have some impact on exception handling (since the exception vector table is also at address 0x0), but while reads and writes are successfully trapped, exception handling seems unaffected.

    I tried setting the MPU with a region at location zero with both read and write access disabled (0x0 for AP field). My experiments show that it still allows reads--only disables writes.  So I'm unable to detect reads of NULL pointers.  Have you been able to get it to trigger on reads from zero as well?

    - Tony

  • My experiments

    You need to describe your experiments, otherwise I have no idea what you actually did.

    Have you been able to get it to trigger on reads from zero as well?

    Yes, I did. That's why I brought it up.

    The code that I used to enable the MPU is here:

    https://github.com/netik/dc27_badge/blob/4bba0e58a304671b48755b99113b3d489259760a/software/firmware/badge/nullprot_lld.c#L72

    This is done with the MPU APIs in ChibiOS, but all of the code for that is in the repo, so you should be able to resolve all the macros and see what values are actually being used.

    This also works for the Cortex-M7 CPU that I'm using now. I use test code like this:

         uint8_t * blah = (uint8_t *)0;
         printf ("moo: %x\n", blah[0]);

    The result I get with my trap handler support is this:

    ********** MEMMANAGE FAULT **********
    Data access violation
    Memory fault address: 0x00000000
    Fault while in thread mode
    Floating point context saved on stack
    Interrupt is pending
    Exception pending: 53
    Exception active: 4
    PC: 0x0021158C LR: 0x00204FF5 SP: 0x200007E8 SR: 0x61000000
    R0: 0x20000F38 R1: 0x20001FE4 R2: 0x00000001 R3: 0x00000000 R12: 0x00000820

    The PC value above is at exactly the instruction that does the load:

    21158c: 7823 ldrb r3, [r4, #0]

    The Nordic SDK has some similar code in it. You can find mine in the badge_vectors.c and badge_fault.S modules in the above github repo. Also, in main.c , I do this:

    /*
      * Enable memory management, usage and bus fault exceptions, so that
      * we don't always end up diverting through the hard fault handler.
      * Note: the memory management fault only applies if the MPU is
      * enabled, which it currently is (for stack guard pages).
      */

    SCB->SHCSR |= SCB_SHCSR_USGFAULTENA_Msk |
    SCB_SHCSR_BUSFAULTENA_Msk |
    SCB_SHCSR_MEMFAULTENA_Msk;

    -Bill

  • Hi Bill,

    I appreciate your timely and detailed response.

    I've taken a look at your code and have tried to replicate the exact settings and I'm still only seeing memory fault on writes to address 0, but reads do not fault.

    Your code is essentially:

    #define mpuConfigureRegion(region, addr, attribs) {                         \
      MPU->RNR  = ((uint32_t)region);                                           \
      MPU->RBAR = ((uint32_t)addr);                                             \
      MPU->RASR = ((uint32_t)attribs);                                          \
    }
    
    mpuConfigureRegion (MPU_REGION_6, 0x0, 
        MPU_RASR_ATTR_AP_NA_NA | 
        MPU_RASR_ATTR_NON_CACHEABLE | 
        MPU_RASR_SIZE_1K | 
        MPU_RASR_ENABLE
    );
    

    Which, following out the macro values--if I'm not mistaken--becomes:

      MPU->RNR  = 6; 
      MPU->RBAR = 0;
      MPU->RASR = 0x00080013;

    When I execute the above code (followed by MPU->CTRL = 0x05), it protects from writes only. I can execute your code snippet just fine:

    uint8_t * blah = (uint8_t *)0;
    printf ("moo: %x\n", blah[0])

    It's only when I execute a write that the memory fault triggers. I'm also enabling the individual faults:

        SCB->SHCSR |= (
          SCB_SHCSR_USGFAULTENA_Msk |
          SCB_SHCSR_BUSFAULTENA_Msk |
          SCB_SHCSR_MEMFAULTENA_Msk
          );

    And I do get a memory management-specific fault when I write, but never on a read.

    I've experimented with slightly different variations of attribute settings (from a previous project running on a STM32F437 MCU that tripped on reads at 0) but the results remain the same--I'm never able to trigger a fault on reading a NULL pointer.

    I'm using an nRF52840 and s140_nrf52_6.1.0_softdevice.hex (and I'm doing this MPU configuration before the SD is enabled--like you seem to do).  So I remain puzzled about what is going on.

    Wait a minute... I just realized that I've always stepped through this code in the debugger.  But, if I let the processor free-run through the fault-generating code (even in the debugger), then it does generate the memory fault on the read!  Wow, wish I had thought of that earlier.  Not exactly sure what aspect of single-stepping is suppressing the read memory fault, but that seems to be what made me think it wasn't working.

Reply
  • Hi Bill,

    I appreciate your timely and detailed response.

    I've taken a look at your code and have tried to replicate the exact settings and I'm still only seeing memory fault on writes to address 0, but reads do not fault.

    Your code is essentially:

    #define mpuConfigureRegion(region, addr, attribs) {                         \
      MPU->RNR  = ((uint32_t)region);                                           \
      MPU->RBAR = ((uint32_t)addr);                                             \
      MPU->RASR = ((uint32_t)attribs);                                          \
    }
    
    mpuConfigureRegion (MPU_REGION_6, 0x0, 
        MPU_RASR_ATTR_AP_NA_NA | 
        MPU_RASR_ATTR_NON_CACHEABLE | 
        MPU_RASR_SIZE_1K | 
        MPU_RASR_ENABLE
    );
    

    Which, following out the macro values--if I'm not mistaken--becomes:

      MPU->RNR  = 6; 
      MPU->RBAR = 0;
      MPU->RASR = 0x00080013;

    When I execute the above code (followed by MPU->CTRL = 0x05), it protects from writes only. I can execute your code snippet just fine:

    uint8_t * blah = (uint8_t *)0;
    printf ("moo: %x\n", blah[0])

    It's only when I execute a write that the memory fault triggers. I'm also enabling the individual faults:

        SCB->SHCSR |= (
          SCB_SHCSR_USGFAULTENA_Msk |
          SCB_SHCSR_BUSFAULTENA_Msk |
          SCB_SHCSR_MEMFAULTENA_Msk
          );

    And I do get a memory management-specific fault when I write, but never on a read.

    I've experimented with slightly different variations of attribute settings (from a previous project running on a STM32F437 MCU that tripped on reads at 0) but the results remain the same--I'm never able to trigger a fault on reading a NULL pointer.

    I'm using an nRF52840 and s140_nrf52_6.1.0_softdevice.hex (and I'm doing this MPU configuration before the SD is enabled--like you seem to do).  So I remain puzzled about what is going on.

    Wait a minute... I just realized that I've always stepped through this code in the debugger.  But, if I let the processor free-run through the fault-generating code (even in the debugger), then it does generate the memory fault on the read!  Wow, wish I had thought of that earlier.  Not exactly sure what aspect of single-stepping is suppressing the read memory fault, but that seems to be what made me think it wasn't working.

Children
  • One other observation, regarding why exception vectors don't trip the MPU:

    "When the MPU is enabled, accesses to the System Control Space and vector table are always permitted. Other areas are accessible based on regions and whether PRIVDEFENA is set to 1."

    Source: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/BABDJJGF.html

  • Not exactly sure what aspect of single-stepping is suppressing the read memory fault, but that seems to be what made me think it wasn't working.

    Hm, interesting. I'm not sure what to tell you about that. In my tests, I've set a breakpoint right on the instruction that would generate the fault, and then done a "stepi" with GDB, to advance just one instruction further, and in that case I can see the PC jump to the expected exception vector.

    It may have to do with exactly what debugger setup you're using. In my case, my development environment is a Nordic nRF52840 DK board (rev 1.0.0) with OpenOCD and GDB, using a stock GCC toolchain. The production environment is a custom board with nRF52840 board using an Olimex ARM-USB-OCD-H debugger (since our board didn't have an on-board J-Link) but the other parts are the same. Either way, the MPU protection worked as expected for both reads and writes.

    I was going to suggest that you check the other MPU regions. According to the ARM docs, if you have two or more MPU regions that overlap, the higher number one takes precedence. So if there's another region which only blocks writes, that one might override your custom one if it has a higher region number.

    BTW, you should be able to compile the code from my repo using a standard arm-none-eabi GNU toolchain (just make sure it's in your path and type 'make') and it will boot up and run on an nRF52840 DK board, though not very well, since there will be no graphics or sound. I was going to suggest you try that just to compare against your own code, but it sounds like that's not necessary now.

    -Bill

  • Hi Bill,

    Yes, I had used stepi as well at times (also using GDB) and it stepped right through the read from zero, fetching the vector value successfully.  One reason why I didn't suspect the debugger as an influence is that a different region with memory protection to detect impending stack overflow (Nordic's STACK GUARD) -- which I modified to trigger on both read and write--works as expected, even when stepping. Both reads and writes immediately memory fault.

    I'm confident that there are no other memory regions.  My code is running early in the initialization process, immediately after nrf_mpu_init() which clears all memory regions. And immediately after setting up this particular memory region, I test it--before any other code could possibly intervene.

    A bit later, after setting up both regions, the MPU regions read as follows:

    00> MPU->RNR = 0, RASR = 0x00080013, RBAR = 0x00000000
    00> MPU->RNR = 1, RASR = 0x1029000D, RBAR = 0x2003B201
    00> MPU->RNR = 2, RASR = 0x00000000, RBAR = 0x00000002
    00> MPU->RNR = 3, RASR = 0x00000000, RBAR = 0x00000003
    00> MPU->RNR = 4, RASR = 0x00000000, RBAR = 0x00000004
    00> MPU->RNR = 5, RASR = 0x00000000, RBAR = 0x00000005
    00> MPU->RNR = 6, RASR = 0x00000000, RBAR = 0x00000006
    00> MPU->RNR = 7, RASR = 0x00000000, RBAR = 0x00000007

    Region 0 is the NULL pointer region and region 1 is the stack guard. The other regions are empty/disabled (the RBAR returns the region number on read--the values were set to zero on write by nrf_mpu_init()).

    I'm using a Segger J-LINK debugger and the project is built using Segger Embedded Studio--which uses gcc under-the-hood.  The target is custom hardware developed by a client.

    Again - thanks for your time and thoughts on this.  I've got a working system for the purposes of what I'm trying to accomplish, even if I don't understand why stepping through reads at 0 acts differently than steeping through reads in the stack guard region.

Related