This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

SoftDevice faulting after indeterminate amount of time - PC = 0x0001089E

Hi there,

I am developing an application using an nRF52832, nRF5_SDK_12.1.0_0d23e2a and Softdevice s132 v3.0.0. The BLE part of the application involves sniffing advertising packets, but not ever making a connection.

I have found that after an indeterminate amount of time (sometimes a day, sometimes longer than a week), the app crashes into app_error_save_and_stop(). This has happened several times using different hardware.

void app_error_save_and_stop(uint32_t id, uint32_t pc, uint32_t info)
{
    /* static error variables - in order to prevent removal by optimizers */
    static volatile struct
    {
        uint32_t        fault_id;
        uint32_t        pc;
        uint32_t        error_info;
        assert_info_t * p_assert_info;
        error_info_t  * p_error_info;
        ret_code_t      err_code;
        uint32_t        line_num;
        const uint8_t * p_file_name;
    } m_error_data = {0};

    // The following variable helps Keil keep the call stack visible, in addition, it can be set to
    // 0 in the debugger to continue executing code after the error check.
    volatile bool loop = true;
    UNUSED_VARIABLE(loop);

    m_error_data.fault_id   = id;
    m_error_data.pc         = pc;
    m_error_data.error_info = info;

    switch (id)
    {
        case NRF_FAULT_ID_SDK_ASSERT:
            m_error_data.p_assert_info = (assert_info_t *)info;
            m_error_data.line_num      = m_error_data.p_assert_info->line_num;
            m_error_data.p_file_name   = m_error_data.p_assert_info->p_file_name;
            break;

        case NRF_FAULT_ID_SDK_ERROR:
            m_error_data.p_error_info = (error_info_t *)info;
            m_error_data.err_code     = m_error_data.p_error_info->err_code;
            m_error_data.line_num     = m_error_data.p_error_info->line_num;
            m_error_data.p_file_name  = m_error_data.p_error_info->p_file_name;
            break;
    }

    UNUSED_VARIABLE(m_error_data);

    // If printing is disrupted, remove the irq calls, or set the loop variable to 0 in the debugger.
    __disable_irq();
    while (loop);

    __enable_irq();
}

Since I haven't been able to catch this error while a debugger is connected, I've had to read out what little info I can by connecting a debugging to a running (but crashed) device, halt the core, and read off PC values etc.

From this I have tracked down that the current PC value is 0x0002B2A4, which from the map file I know is inside app_error_save_and_stop(). Also from the map file I have found the memory location for the variable m_error_data, which is 0x2000334C. Reading off the data at this location gives the following:

J-Link>mem32 0x2000334C, 8
2000334C = 00000001 0001089E 00000000 00000000
2000335C = 00000000 00000000 00000000 00000000

So from this I have determined the fault_id is 1 (this is a NRF_FAULT_ID_SD_ASSERT), the PC right before it crashed was 0x0001089E, and since this came from inside the SoftDevice, no other info (line num etc.) could be saved and is therefore zero.

Using the fact that the PC = 0x0001089E right before it crashed, is someone able to provide some insight into where within the SoftDevice this error occurred, and what may have caused it?

Thanks,

Related