Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

nRF51822: AppError after some DFUs

Hello everyone,

We are using an nRF51822 SoC with nRF5 SDK v12.3.0 and s130 SoftDevice.
The device is a Bluetooth peripheric that connects to a Mobile Application [App].
We are using buttonless BLE DFU triggered by a BLE command from the App.

The issue at hand is that, after some DFU, the device throws an AppError. Even when reset will keep throwing this Error, effectively resulting in a soft-brick. Disconnecting from power and connecting again doesn't change this behavior.
This can happen after just a couple DFUs or after 20+ successful DFUs. I haven't been able to confirm any consistency while reproducing this issue.

We have implemented the app_error_handler() based on the app_error_save_and_stop(), according to this ticket.
It has been modified to print the relevant information and turn on the display with a certain pattern.
 

void app_error_fault_handler(uint32_t id, uint32_t pc, uint32_t info){
    update_error_fault_metrics(id, info);
    NRF_LOG_INFO("****AppErrorFault! id: 0x%x, info: 0x%x\n", id, info);
    display_set_pattern(CHARGING_PATTERN);

    /* static error variables - in order to prevent removal by optimizers */
    static volatile struct
    {
        uint32_t        fault_id;
        uint32_t        pc;
        uint32_t        error_info;
        assert_info_t * p_assert_info;
        error_info_t  * p_error_info;
        ret_code_t      err_code;
        uint32_t        line_num;
        const uint8_t * p_file_name;
    } m_error_data = {0};

    // The following variable helps Keil keep the call stack visible, in addition, it can be set to
    // 0 in the debugger to continue executing code after the error check.
    volatile bool loop = true;
    UNUSED_VARIABLE(loop);

    m_error_data.fault_id   = id;
    m_error_data.pc         = pc;
    m_error_data.error_info = info;

    switch (id)
    {
        case NRF_FAULT_ID_SDK_ASSERT:
            NRF_LOG_INFO("****AppErrorFault! NRF_FAULT_ID_SDK_ASSERT\n");
            m_error_data.p_assert_info = (assert_info_t *)info;
            m_error_data.line_num      = m_error_data.p_assert_info->line_num;
            m_error_data.p_file_name   = m_error_data.p_assert_info->p_file_name;
            break;

        case NRF_FAULT_ID_SDK_ERROR:
            NRF_LOG_INFO("****AppErrorFault! NRF_FAULT_ID_SDK_ERROR\n");
            m_error_data.p_error_info = (error_info_t *)info;
            m_error_data.err_code     = m_error_data.p_error_info->err_code;
            m_error_data.line_num     = m_error_data.p_error_info->line_num;
            m_error_data.p_file_name  = m_error_data.p_error_info->p_file_name;
            break;
    }
    NRF_LOG_INFO("Line Number: %u\r\n", m_error_data.line_num);
    NRF_LOG_INFO("File Name:   %s\r\n", (const uint8_t *)(m_error_data.p_file_name));

    UNUSED_VARIABLE(m_error_data);

    // If printing is disrupted, remove the irq calls, or set the loop variable to 0 in the debugger.
    __disable_irq();
    while (loop){
        feed_wdt();
        nrf_delay_ms(1000);
    }

    __enable_irq();

    NVIC_SystemReset();
}


This implementation is working if we force an App Error Fault. We've implemented a BLE command that will simulate an App Error, like so:

    case APP_CMD_SIMULATE_ERROR_FAULT:            
        APP_ERROR_CHECK(1);
        break; 


In the RTT Viewer it is possible to see that the app_error_handler() implementation works:

app_error_handler-successful-print

But when the error occurs, the file name and line number aren't printed.
Instead, the debugger prints out a series of AppErrorFault! calls with a decreasing id by 0x68 each line:
AppError-debug-loop
This keeps happening until the watchdog times out and resets the device (as can be seen at the start of the print).
Then it happens again during initialization of the device.

To me it looks like the app_error_handler() has an App Error in it and is catching itself.

You can find an excerpt of the RTT Viewer debug log here.

To me, this looks like some sort of corruption of the application during DFU.
But shouldn't the bootloader check the contents of the application after DFU before rebooting into it?

Has this ever happened to anyone before?

What more steps can we take to diagnose this further?

 

Please let me know if there's any more information I should provide.


Thank you in advance for your time and help,
 - BBFonseca

Parents Reply Children
No Data
Related