This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Stack Guard and MPU

Hi, I'm trying to get the nrf_stack_guard and nrf_mpu libraries to catch writes past the end of the stack. My stack is 8kB in size (0x2000E000-0x20010000). This is the log output after initializing the Stack Guard module:

<debug> nrf_mpu: MPU region creating (location: 0x2000E000-0x2000E07F)
<debug> nrf_mpu: MPU region 0 created (location: 0x2000E000-0x2000E07F, access: RO/RO, type: Normal, flags: XN).
<info> stack_guard: Stack Guard: 0x2000E000-0x2000E07F (usable stack area: 8064 bytes)

I'm having a little trouble understanding how the MPU works. I'd expect a write to 0x2000E000 to trigger the HardFault_Handler, but in reality nothing is triggered. The write just happens and silentlly corrupts RAM below the stack.

I'm using the nrf gcc hardfault library implementation and am able to catch NULL dereferences and other faults, so that should be correctly set up.

What am I missing?

Parents
  • Hi all, thank you for your responses. I'm a bit dumbfounded as to why I can't get the Stack Guard / MPU to work. I must me fundamentally misunderstanding something.

    In an attempt to isolate the problem, I've taken the ble_app_cli example (SDK 15.2.0), which seems to be the only example utilizing the stack guard) - and added a write to the base of the stack.

    static void core_init(void)
    {
        APP_ERROR_CHECK(NRF_LOG_INIT(app_timer_cnt_get));
    
        if (CoreDebug->DHCSR & CoreDebug_DHCSR_C_DEBUGEN_Msk)
        {
            APP_ERROR_CHECK(nrf_cli_init(&m_cli, NULL, true, true, NRF_LOG_SEVERITY_INFO));
        }
    
        nrf_drv_uart_config_t uart_config = NRF_DRV_UART_DEFAULT_CONFIG;
        uart_config.pseltxd = TX_PIN_NUMBER;
        uart_config.pselrxd = RX_PIN_NUMBER;
        uart_config.hwfc    = NRF_UART_HWFC_DISABLED;
        APP_ERROR_CHECK(nrf_cli_init(&m_cli_uart, &uart_config, true, true, NRF_LOG_SEVERITY_INFO));
    
        APP_ERROR_CHECK(nrf_drv_clock_init());
    
        nrf_drv_clock_lfclk_request(NULL);
    
        APP_ERROR_CHECK(app_timer_init());
    
        APP_ERROR_CHECK(nrf_stack_guard_init());
    
        *(volatile uint32_t*) STACK_BASE = 0xba5eba11;
    
        NRF_LOG_INFO("Written to stack base (%p): %x", STACK_BASE, *((uint32_t*)STACK_BASE));
    
        APP_ERROR_CHECK(nrf_pwr_mgmt_init());
    
        if (CoreDebug->DHCSR & CoreDebug_DHCSR_C_DEBUGEN_Msk)
        {
            APP_ERROR_CHECK(nrf_cli_task_create(&m_cli));
        }
    
        APP_ERROR_CHECK(nrf_cli_task_create(&m_cli_uart));
    }

    This code runs past the "illegal" write and continues normally. From my understanding, this should trigger a HardFault.

    Can somebody tell me what I'm missing?

Reply
  • Hi all, thank you for your responses. I'm a bit dumbfounded as to why I can't get the Stack Guard / MPU to work. I must me fundamentally misunderstanding something.

    In an attempt to isolate the problem, I've taken the ble_app_cli example (SDK 15.2.0), which seems to be the only example utilizing the stack guard) - and added a write to the base of the stack.

    static void core_init(void)
    {
        APP_ERROR_CHECK(NRF_LOG_INIT(app_timer_cnt_get));
    
        if (CoreDebug->DHCSR & CoreDebug_DHCSR_C_DEBUGEN_Msk)
        {
            APP_ERROR_CHECK(nrf_cli_init(&m_cli, NULL, true, true, NRF_LOG_SEVERITY_INFO));
        }
    
        nrf_drv_uart_config_t uart_config = NRF_DRV_UART_DEFAULT_CONFIG;
        uart_config.pseltxd = TX_PIN_NUMBER;
        uart_config.pselrxd = RX_PIN_NUMBER;
        uart_config.hwfc    = NRF_UART_HWFC_DISABLED;
        APP_ERROR_CHECK(nrf_cli_init(&m_cli_uart, &uart_config, true, true, NRF_LOG_SEVERITY_INFO));
    
        APP_ERROR_CHECK(nrf_drv_clock_init());
    
        nrf_drv_clock_lfclk_request(NULL);
    
        APP_ERROR_CHECK(app_timer_init());
    
        APP_ERROR_CHECK(nrf_stack_guard_init());
    
        *(volatile uint32_t*) STACK_BASE = 0xba5eba11;
    
        NRF_LOG_INFO("Written to stack base (%p): %x", STACK_BASE, *((uint32_t*)STACK_BASE));
    
        APP_ERROR_CHECK(nrf_pwr_mgmt_init());
    
        if (CoreDebug->DHCSR & CoreDebug_DHCSR_C_DEBUGEN_Msk)
        {
            APP_ERROR_CHECK(nrf_cli_task_create(&m_cli));
        }
    
        APP_ERROR_CHECK(nrf_cli_task_create(&m_cli_uart));
    }

    This code runs past the "illegal" write and continues normally. From my understanding, this should trigger a HardFault.

    Can somebody tell me what I'm missing?

Children
  • You made me curious enough that I actually hauled out my nRF52840 DK reference board and re-flashed it to run the SDK demos.

    The problem here is not that there's something wrong with you, the problem is there's something wrong with the universe:

    this example is buggy.

    Note that this is not the only example that uses the stack guard feature. The examples/peripheral/cli/main.c code also uses it, and that code does it right.

    What's not immediately obvious is that being a CLI example, there's actually an MPU command built into the CLI. If you run it, the problem becomes clear. Here's what I get:

    uart_cli:~$ mpu info
    MPU State: Disabled, 8 unified regions aviable.
    
    Region 0: Enabled
            - Location:     0x2003E000-0x2003E07F (size: 128 bytes)
            - Access:       RO/RO
            - Type:         Normal
            - Caching:      WBWA/WBWA
            - Flags:        XN
    
    Region 1: Enabled
            - Location:     0x20004301-0x20004400 (size: 256 bytes)
            - Access:       RO/RO
            - Type:         Normal
            - Caching:      WBWA/WBWA
            - Flags:        XN
    
    Region 2: Disabled
    Region 3: Disabled
    Region 4: Disabled
    Region 5: Disabled
    Region 6: Disabled
    Region 7: Disabled
    [00:02:38.000,213] <info> app: Battery level update: 97
    uart_cli:~$ mpu dump
    MPU_TYPE:       0x00000800
    MPU_CTRL:       0x00000000
    
    MPU_RBAR[0]:    0x2003E000
    MPU_RASR[0]:    0x1729000D
    
    MPU_RBAR[1]:    0x20004301
    MPU_RASR[1]:    0x1729000F
    
    MPU_RBAR[2]:    0x00000002
    MPU_RASR[2]:    0x00000000
    
    MPU_RBAR[3]:    0x00000003
    MPU_RASR[3]:    0x00000000
    
    MPU_RBAR[4]:    0x00000004
    MPU_RASR[4]:    0x00000000
    
    MPU_RBAR[5]:    0x00000005
    MPU_RASR[5]:    0x00000000
    
    MPU_RBAR[6]:    0x00000006
    MPU_RASR[6]:    0x00000000
    
    MPU_RBAR[7]:    0x00000007
    MPU_RASR[7]:    0x00000000
    uart_cli:~$ 

    The most important part is where it says: MPU State: Disabled, 8 unified regions aviable.

    Yes, Disabled.

    If you download the book that I linked in my first reply, in the MPU section it documents the operation of the MPU control register. You must set bit 0 in that register to actually turn the MPU on. But here we can see that hasn't been done:

    MPU_CTRL: 0x00000000

    You can also inspect the MPU registers from the debugger directly. They start at address 0xe000ed90. You can refer to table 4-38 in the manual above for the complete map.

    The reason this is happening is that it's not enough to call nrf_mpu_stack_guard_init(). You *also* have to call nrf_mpu_init(). This example _DOESN'T_ do that, which is incredibly dumb.

    Go to main.c in the example and do this:

        APP_ERROR_CHECK(nrf_mpu_init());  /* Add me! */
        APP_ERROR_CHECK(nrf_stack_guard_init());

    Now re-run your test. When I do it, I get the following endless hard fault cycle:

    [00:00:00.000,000] <info> stack_guard: Stack Guard (128 bytes): 0x2003E000-0x2003E07F (total stack size: 8192 bytes, usable stack area: 8064 bytes)
    [00:00:00.000,000] <error> hardfault: HARD FAULT at 0x0003460E
    [00:00:00.000,000] <error> hardfault:   R0:  0x00000000  R1:  0x0F81B159  R2:  0xBA5EBA11  R3:  0x2003E000
    [00:00:00.000,000] <error> hardfault:   R12: 0x2000360C  LR:  0x00027EC9  PSR: 0x61000000
    [00:00:00.000,000] <error> hardfault: Cause: The processor attempted a load or store at a location that does not permit the operation.
    [00:00:00.000,000] <error> hardfault: MemManage Fault Address: 0x2003E000
    [00:00:00.000,000] <info> stack_guard: Stack Guard (128 bytes): 0x2003E000-0x2003E07F (total stack size: 8192 bytes, usable stack area: 8064 bytes)
    [00:00:00.000,000] <error> hardfault: HARD FAULT at 0x0003460E
    [00:00:00.000,000] <error> hardfault:   R0:  0x00000000  R1:  0x0F81B159  R2:  0xBA5EBA11  R3:  0x2003E000
    [00:00:00.000,000] <error> hardfault:   R12: 0x2000360C  LR:  0x00027EC9  PSR: 0x61000000
    [00:00:00.000,000] <info> stack_guard: Stack Guard (128 bytes): 0x2003E000-0x2003E07F (total stack size: 8192 bytes, usable stack area: 8064 bytes)
    [00:00:00.000,000] <error> hardfault: HARD FAULT at 0x0003460E
    [00:00:00.000,000] <error> hardfault:   R0:  0x00000000  R1:  0x0F81B159  R2:  0xBA5EBA11  R3:  0x2003E000
    [00:00:00.000,000] <error> hardfault:   R12: 0x2000360C  LR:  0x00027EC9  PSR: 0x61000000
    [00:00:00.000,000] <error> hardfault: Cause: The processor attempted a load or store at a location that does not permit the operation.
    [00:00:00.000,000] <error> hardfault: MemManage Fault Address: 0x2003E000
    [...]

    The examples/peripheral/cli/main.c code actually does call nrf_mpu_init() before nrf_stack_guard_init(). I don't know why this one doesn't.

    -Bill

  • Hi Bill,

    Thank you for investigating and reporting this. I have reported the issue with the missing call to nrf_mpu_init() in ble_app_cli() to the SDK developers.

  • Thank you!

    Finally, this all makes sense. I actually had a look at the peripheral/cli example first, but I couldn't see that it was including or initializing the Stack Guard / MPU at all. Maybe that's been updated in the latest SDK release.

    But anyway, it's hardfaulting now when I try to write to the bottom of the stack.

    I would have thought it to be the nrf_stack_guard module's responsibility to make sure the MPU is enabled?

  • I actually had a look at the peripheral/cli example first, but I couldn't see that it was including or initializing the Stack Guard / MPU at all.

    If you look in examples/peripheral/cli/main.c, you should see this:

    static inline void stack_guard_init(void)
    {
        APP_ERROR_CHECK(nrf_mpu_init());
        APP_ERROR_CHECK(nrf_stack_guard_init());
    }

    When I searched the whole SDK for references to that function, that's what popped up.

    Also I realized I forgot to say that you also need to add #include "nrf_mpu.h" in addition to adding the call to nrf_mpu_init(). Sorry about that. (But you obviously must have figured that out already. :) )

    I would have thought it to be the nrf_stack_guard module's responsibility to make sure the MPU is enabled?

    I would have thought so too. It's not entirely unreasonable though, if you consider that the MPU can be used for various other things besides just stack protection. In that case you might want to initialize the MPU separately for some other purpose (and having nrf_stack_guard_init() potentially do it again might therefore be redundant). But that's just a guess.

    One of the other things I like to do with the MPU is to put a guard block at address 0x0. On many Cortex-M cores, this is the start of flash, where the vector table is, and as such doing a read on it will always succeed. It turns out that a write will also succeed, in the sense that it won't trigger a trap (though obviously it won't have any effect since the flash is normally read-only). This means if you have a bug in your program where you're using a NULL pointer by mistake, you might not notice it right away. If you define a small MPU region right at address 0 with the user/supervisor mode access rights set to NA/NA, then the CPU will throw a memory fault if you attempt any access there, which will alert you to the problem right away. (I was worried at one point that this might have some negative effects on on the CPU accessing the vector table when handling a trap, but apparently it doesn't.)

    Note that the enable bit in the control register is the master enable for the whole MPU module. There's also an enable bit for each region, but the master enable also has to be set in order for any of the region definitions to have any effect.

    I suppose this might be easy to overlook given that the example is not expected to force a crash, so normally you would not see a fault.

    Also, I just noticed that in the "mpu info" output, the word "available" is spelled wrong.

    Anyway I'm glad it works for you now.

    -Bill

Related