This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Stack Guard and MPU

Hi, I'm trying to get the nrf_stack_guard and nrf_mpu libraries to catch writes past the end of the stack. My stack is 8kB in size (0x2000E000-0x20010000). This is the log output after initializing the Stack Guard module:

<debug> nrf_mpu: MPU region creating (location: 0x2000E000-0x2000E07F)
<debug> nrf_mpu: MPU region 0 created (location: 0x2000E000-0x2000E07F, access: RO/RO, type: Normal, flags: XN).
<info> stack_guard: Stack Guard: 0x2000E000-0x2000E07F (usable stack area: 8064 bytes)

I'm having a little trouble understanding how the MPU works. I'd expect a write to 0x2000E000 to trigger the HardFault_Handler, but in reality nothing is triggered. The write just happens and silentlly corrupts RAM below the stack.

I'm using the nrf gcc hardfault library implementation and am able to catch NULL dereferences and other faults, so that should be correctly set up.

What am I missing?

Parents
  • Hi all, thank you for your responses. I'm a bit dumbfounded as to why I can't get the Stack Guard / MPU to work. I must me fundamentally misunderstanding something.

    In an attempt to isolate the problem, I've taken the ble_app_cli example (SDK 15.2.0), which seems to be the only example utilizing the stack guard) - and added a write to the base of the stack.

    static void core_init(void)
    {
        APP_ERROR_CHECK(NRF_LOG_INIT(app_timer_cnt_get));
    
        if (CoreDebug->DHCSR & CoreDebug_DHCSR_C_DEBUGEN_Msk)
        {
            APP_ERROR_CHECK(nrf_cli_init(&m_cli, NULL, true, true, NRF_LOG_SEVERITY_INFO));
        }
    
        nrf_drv_uart_config_t uart_config = NRF_DRV_UART_DEFAULT_CONFIG;
        uart_config.pseltxd = TX_PIN_NUMBER;
        uart_config.pselrxd = RX_PIN_NUMBER;
        uart_config.hwfc    = NRF_UART_HWFC_DISABLED;
        APP_ERROR_CHECK(nrf_cli_init(&m_cli_uart, &uart_config, true, true, NRF_LOG_SEVERITY_INFO));
    
        APP_ERROR_CHECK(nrf_drv_clock_init());
    
        nrf_drv_clock_lfclk_request(NULL);
    
        APP_ERROR_CHECK(app_timer_init());
    
        APP_ERROR_CHECK(nrf_stack_guard_init());
    
        *(volatile uint32_t*) STACK_BASE = 0xba5eba11;
    
        NRF_LOG_INFO("Written to stack base (%p): %x", STACK_BASE, *((uint32_t*)STACK_BASE));
    
        APP_ERROR_CHECK(nrf_pwr_mgmt_init());
    
        if (CoreDebug->DHCSR & CoreDebug_DHCSR_C_DEBUGEN_Msk)
        {
            APP_ERROR_CHECK(nrf_cli_task_create(&m_cli));
        }
    
        APP_ERROR_CHECK(nrf_cli_task_create(&m_cli_uart));
    }

    This code runs past the "illegal" write and continues normally. From my understanding, this should trigger a HardFault.

    Can somebody tell me what I'm missing?

  • You made me curious enough that I actually hauled out my nRF52840 DK reference board and re-flashed it to run the SDK demos.

    The problem here is not that there's something wrong with you, the problem is there's something wrong with the universe:

    this example is buggy.

    Note that this is not the only example that uses the stack guard feature. The examples/peripheral/cli/main.c code also uses it, and that code does it right.

    What's not immediately obvious is that being a CLI example, there's actually an MPU command built into the CLI. If you run it, the problem becomes clear. Here's what I get:

    uart_cli:~$ mpu info
    MPU State: Disabled, 8 unified regions aviable.
    
    Region 0: Enabled
            - Location:     0x2003E000-0x2003E07F (size: 128 bytes)
            - Access:       RO/RO
            - Type:         Normal
            - Caching:      WBWA/WBWA
            - Flags:        XN
    
    Region 1: Enabled
            - Location:     0x20004301-0x20004400 (size: 256 bytes)
            - Access:       RO/RO
            - Type:         Normal
            - Caching:      WBWA/WBWA
            - Flags:        XN
    
    Region 2: Disabled
    Region 3: Disabled
    Region 4: Disabled
    Region 5: Disabled
    Region 6: Disabled
    Region 7: Disabled
    [00:02:38.000,213] <info> app: Battery level update: 97
    uart_cli:~$ mpu dump
    MPU_TYPE:       0x00000800
    MPU_CTRL:       0x00000000
    
    MPU_RBAR[0]:    0x2003E000
    MPU_RASR[0]:    0x1729000D
    
    MPU_RBAR[1]:    0x20004301
    MPU_RASR[1]:    0x1729000F
    
    MPU_RBAR[2]:    0x00000002
    MPU_RASR[2]:    0x00000000
    
    MPU_RBAR[3]:    0x00000003
    MPU_RASR[3]:    0x00000000
    
    MPU_RBAR[4]:    0x00000004
    MPU_RASR[4]:    0x00000000
    
    MPU_RBAR[5]:    0x00000005
    MPU_RASR[5]:    0x00000000
    
    MPU_RBAR[6]:    0x00000006
    MPU_RASR[6]:    0x00000000
    
    MPU_RBAR[7]:    0x00000007
    MPU_RASR[7]:    0x00000000
    uart_cli:~$ 

    The most important part is where it says: MPU State: Disabled, 8 unified regions aviable.

    Yes, Disabled.

    If you download the book that I linked in my first reply, in the MPU section it documents the operation of the MPU control register. You must set bit 0 in that register to actually turn the MPU on. But here we can see that hasn't been done:

    MPU_CTRL: 0x00000000

    You can also inspect the MPU registers from the debugger directly. They start at address 0xe000ed90. You can refer to table 4-38 in the manual above for the complete map.

    The reason this is happening is that it's not enough to call nrf_mpu_stack_guard_init(). You *also* have to call nrf_mpu_init(). This example _DOESN'T_ do that, which is incredibly dumb.

    Go to main.c in the example and do this:

        APP_ERROR_CHECK(nrf_mpu_init());  /* Add me! */
        APP_ERROR_CHECK(nrf_stack_guard_init());

    Now re-run your test. When I do it, I get the following endless hard fault cycle:

    [00:00:00.000,000] <info> stack_guard: Stack Guard (128 bytes): 0x2003E000-0x2003E07F (total stack size: 8192 bytes, usable stack area: 8064 bytes)
    [00:00:00.000,000] <error> hardfault: HARD FAULT at 0x0003460E
    [00:00:00.000,000] <error> hardfault:   R0:  0x00000000  R1:  0x0F81B159  R2:  0xBA5EBA11  R3:  0x2003E000
    [00:00:00.000,000] <error> hardfault:   R12: 0x2000360C  LR:  0x00027EC9  PSR: 0x61000000
    [00:00:00.000,000] <error> hardfault: Cause: The processor attempted a load or store at a location that does not permit the operation.
    [00:00:00.000,000] <error> hardfault: MemManage Fault Address: 0x2003E000
    [00:00:00.000,000] <info> stack_guard: Stack Guard (128 bytes): 0x2003E000-0x2003E07F (total stack size: 8192 bytes, usable stack area: 8064 bytes)
    [00:00:00.000,000] <error> hardfault: HARD FAULT at 0x0003460E
    [00:00:00.000,000] <error> hardfault:   R0:  0x00000000  R1:  0x0F81B159  R2:  0xBA5EBA11  R3:  0x2003E000
    [00:00:00.000,000] <error> hardfault:   R12: 0x2000360C  LR:  0x00027EC9  PSR: 0x61000000
    [00:00:00.000,000] <info> stack_guard: Stack Guard (128 bytes): 0x2003E000-0x2003E07F (total stack size: 8192 bytes, usable stack area: 8064 bytes)
    [00:00:00.000,000] <error> hardfault: HARD FAULT at 0x0003460E
    [00:00:00.000,000] <error> hardfault:   R0:  0x00000000  R1:  0x0F81B159  R2:  0xBA5EBA11  R3:  0x2003E000
    [00:00:00.000,000] <error> hardfault:   R12: 0x2000360C  LR:  0x00027EC9  PSR: 0x61000000
    [00:00:00.000,000] <error> hardfault: Cause: The processor attempted a load or store at a location that does not permit the operation.
    [00:00:00.000,000] <error> hardfault: MemManage Fault Address: 0x2003E000
    [...]

    The examples/peripheral/cli/main.c code actually does call nrf_mpu_init() before nrf_stack_guard_init(). I don't know why this one doesn't.

    -Bill

  • Hi Bill,

    Thank you for investigating and reporting this. I have reported the issue with the missing call to nrf_mpu_init() in ble_app_cli() to the SDK developers.

Reply Children
No Data
Related