nRF51822: Getting instruction address from the fault handlers

Hello, guys!

We are using nRF51822 SoC with nRF5 SDK v12.3.0 and s130 SoftDevice. For the devices in the wild, we would like to implement the following health monitoring strategy:

  1. Detect when the fault events happen (e.g. HardFaults, App Error Faults, Watchdog reset),
  2. Store the info about the fault event
  3. Report the info about the fault event(s) to the app over the BLE link, once the connection is established.

We have overwritten the weak implementations of the HardFault_Handler() and app_error_fault_handler() functions the following way (the implementations were taken from here):

void HardFault_Handler(void)
{
    uint32_t *sp = (uint32_t *) __get_MSP(); // Get stack pointer
    uint32_t ia = sp[12]; // Get instruction address from stack

    // Store instruction address (ia) into the memory
    update_hardfault_metrics(ia);
    NVIC_SystemReset();
}

void app_error_fault_handler(uint32_t id, uint32_t pc, uint32_t info){
    // Store fault identifier (id) and program counter of the instruction that caused the fault
    update_error_fault_metrics(id, pc);
    NVIC_SystemReset();    
}

As you can see, inside the HardFault_Handler() we take and store the instruction address (ia) that caused the HardFault whereas in the case of app_error_fault_handler() we take and store the fault identifier (id) as well as the program counter (pc) of the instruction that caused the fault.

To test our approach, we used the following helper functions that we call from within the code to artificially provoke the faults:

To provoke HardFault event:

static int illegal_instruction_execution(void) {
    uint32_t dummy = *(volatile uint32_t *) 0xFFFFFFFF;
    (void)dummy;  
}

To provoke App Error Fault event:

APP_ERROR_CHECK(1);

With those helper functions, we were able to detect the fault events but in both cases, instruction address (ia) and the program counter (pc) values are both equal to 0x00.

Do you have any idea why we don't get the proper address values?

Thanks in advance.

Sincerely,
Bojan.

Parents
  • You forgot to declare the Hardfault_Handler() to be "naked" (yes, that is an attribute in gcc). Otherwise it generates the normal pre stacking of any function type - like pushing LR for example.

    Note that the softdevice already handles hard faults and will mess up the stack - at least it tries to read from it (which can cause a lockup reset).

    Its better to use the vector catch in the debugger. For devices in the wild its better to just NVIC_SystemReset() in the Fault handler  - you have to be lucky to get a function call working properly.

  • Thanks for the fast feedback,  

    We have declared both functions as "naked" with:

    void HardFault_Handler(void) __attribute__(( naked ));
    void app_error_fault_handler(uint32_t id, uint32_t pc, uint32_t info) __attribute__(( naked ));

    So, for the devices in the wild, the best will be just to count the number of fault events (HardFault, App Error Fault)?

    It will be much more useful for us if we could get some more details about the Faulty event so that we can fix the bug causing it.

  • The app_error_fault_handler() is a normal function, but the calling (on the SDK side) contains tons of compiler magic and IFDEFs. It is possible to deactivate the info functionality in order to save flash memory.

    Not sure if the "pc" argument actually ever worked, I always used the stuff in "info" for app errors.

  • Thanks!

    Where can I find something more about the meaning of fault identifier (id) and info fields?

  • bojan said:
    Where can I find something more about the meaning of fault identifier (id) and info fields?

    Take a look at how the app_error_save_and_stop() parses the id and info field

    Snippet:

    void app_error_save_and_stop(uint32_t id, uint32_t pc, uint32_t info)
    {
        /* static error variables - in order to prevent removal by optimizers */
        static volatile struct
        {
            uint32_t        fault_id;
            uint32_t        pc;
            uint32_t        error_info;
            assert_info_t * p_assert_info;
            error_info_t  * p_error_info;
            ret_code_t      err_code;
            uint32_t        line_num;
            const uint8_t * p_file_name;
        } m_error_data = {0};
    
        // The following variable helps Keil keep the call stack visible, in addition, it can be set to
        // 0 in the debugger to continue executing code after the error check.
        volatile bool loop = true;
        UNUSED_VARIABLE(loop);
    
        m_error_data.fault_id   = id;
        m_error_data.pc         = pc;
        m_error_data.error_info = info;
    
        switch (id)
        {
            case NRF_FAULT_ID_SDK_ASSERT:
                m_error_data.p_assert_info = (assert_info_t *)info;
                m_error_data.line_num      = m_error_data.p_assert_info->line_num;
                m_error_data.p_file_name   = m_error_data.p_assert_info->p_file_name;
                break;
    
            case NRF_FAULT_ID_SDK_ERROR:
                m_error_data.p_error_info = (error_info_t *)info;
                m_error_data.err_code     = m_error_data.p_error_info->err_code;
                m_error_data.line_num     = m_error_data.p_error_info->line_num;
                m_error_data.p_file_name  = m_error_data.p_error_info->p_file_name;
                break;
        }
    
        UNUSED_VARIABLE(m_error_data);
    
        // If printing is disrupted, remove the irq calls, or set the loop variable to 0 in the debugger.
        __disable_irq();
        while (loop);
    
        __enable_irq();
    }

Reply
  • bojan said:
    Where can I find something more about the meaning of fault identifier (id) and info fields?

    Take a look at how the app_error_save_and_stop() parses the id and info field

    Snippet:

    void app_error_save_and_stop(uint32_t id, uint32_t pc, uint32_t info)
    {
        /* static error variables - in order to prevent removal by optimizers */
        static volatile struct
        {
            uint32_t        fault_id;
            uint32_t        pc;
            uint32_t        error_info;
            assert_info_t * p_assert_info;
            error_info_t  * p_error_info;
            ret_code_t      err_code;
            uint32_t        line_num;
            const uint8_t * p_file_name;
        } m_error_data = {0};
    
        // The following variable helps Keil keep the call stack visible, in addition, it can be set to
        // 0 in the debugger to continue executing code after the error check.
        volatile bool loop = true;
        UNUSED_VARIABLE(loop);
    
        m_error_data.fault_id   = id;
        m_error_data.pc         = pc;
        m_error_data.error_info = info;
    
        switch (id)
        {
            case NRF_FAULT_ID_SDK_ASSERT:
                m_error_data.p_assert_info = (assert_info_t *)info;
                m_error_data.line_num      = m_error_data.p_assert_info->line_num;
                m_error_data.p_file_name   = m_error_data.p_assert_info->p_file_name;
                break;
    
            case NRF_FAULT_ID_SDK_ERROR:
                m_error_data.p_error_info = (error_info_t *)info;
                m_error_data.err_code     = m_error_data.p_error_info->err_code;
                m_error_data.line_num     = m_error_data.p_error_info->line_num;
                m_error_data.p_file_name  = m_error_data.p_error_info->p_file_name;
                break;
        }
    
        UNUSED_VARIABLE(m_error_data);
    
        // If printing is disrupted, remove the irq calls, or set the loop variable to 0 in the debugger.
        __disable_irq();
        while (loop);
    
        __enable_irq();
    }

Children
Related