Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

nRF51822: AppError after some DFUs

Hello everyone,

We are using an nRF51822 SoC with nRF5 SDK v12.3.0 and s130 SoftDevice.
The device is a Bluetooth peripheric that connects to a Mobile Application [App].
We are using buttonless BLE DFU triggered by a BLE command from the App.

The issue at hand is that, after some DFU, the device throws an AppError. Even when reset will keep throwing this Error, effectively resulting in a soft-brick. Disconnecting from power and connecting again doesn't change this behavior.
This can happen after just a couple DFUs or after 20+ successful DFUs. I haven't been able to confirm any consistency while reproducing this issue.

We have implemented the app_error_handler() based on the app_error_save_and_stop(), according to this ticket.
It has been modified to print the relevant information and turn on the display with a certain pattern.

void app_error_fault_handler(uint32_t id, uint32_t pc, uint32_t info){
    update_error_fault_metrics(id, info);
    NRF_LOG_INFO("****AppErrorFault! id: 0x%x, info: 0x%x\n", id, info);
    display_set_pattern(CHARGING_PATTERN);

    /* static error variables - in order to prevent removal by optimizers */
    static volatile struct
    {
        uint32_t        fault_id;
        uint32_t        pc;
        uint32_t        error_info;
        assert_info_t * p_assert_info;
        error_info_t  * p_error_info;
        ret_code_t      err_code;
        uint32_t        line_num;
        const uint8_t * p_file_name;
    } m_error_data = {0};

    // The following variable helps Keil keep the call stack visible, in addition, it can be set to
    // 0 in the debugger to continue executing code after the error check.
    volatile bool loop = true;
    UNUSED_VARIABLE(loop);

    m_error_data.fault_id   = id;
    m_error_data.pc         = pc;
    m_error_data.error_info = info;

    switch (id)
    {
        case NRF_FAULT_ID_SDK_ASSERT:
            NRF_LOG_INFO("****AppErrorFault! NRF_FAULT_ID_SDK_ASSERT\n");
            m_error_data.p_assert_info = (assert_info_t *)info;
            m_error_data.line_num      = m_error_data.p_assert_info->line_num;
            m_error_data.p_file_name   = m_error_data.p_assert_info->p_file_name;
            break;

        case NRF_FAULT_ID_SDK_ERROR:
            NRF_LOG_INFO("****AppErrorFault! NRF_FAULT_ID_SDK_ERROR\n");
            m_error_data.p_error_info = (error_info_t *)info;
            m_error_data.err_code     = m_error_data.p_error_info->err_code;
            m_error_data.line_num     = m_error_data.p_error_info->line_num;
            m_error_data.p_file_name  = m_error_data.p_error_info->p_file_name;
            break;
    }
    NRF_LOG_INFO("Line Number: %u\r\n", m_error_data.line_num);
    NRF_LOG_INFO("File Name:   %s\r\n", (const uint8_t *)(m_error_data.p_file_name));

    UNUSED_VARIABLE(m_error_data);

    // If printing is disrupted, remove the irq calls, or set the loop variable to 0 in the debugger.
    __disable_irq();
    while (loop){
        feed_wdt();
        nrf_delay_ms(1000);
    }

    __enable_irq();

    NVIC_SystemReset();
}

This implementation is working if we force an App Error Fault. We've implemented a BLE command that will simulate an App Error, like so:

    case APP_CMD_SIMULATE_ERROR_FAULT:            
        APP_ERROR_CHECK(1);
        break;

In the RTT Viewer it is possible to see that the app_error_handler() implementation works:

app_error_handler-successful-print

But when the error occurs, the file name and line number aren't printed.
Instead, the debugger prints out a series of AppErrorFault! calls with a decreasing id by 0x68 each line:
AppError-debug-loop
This keeps happening until the watchdog times out and resets the device (as can be seen at the start of the print).
Then it happens again during initialization of the device.

To me it looks like the app_error_handler() has an App Error in it and is catching itself.

You can find an excerpt of the RTT Viewer debug log here.

To me, this looks like some sort of corruption of the application during DFU.
But shouldn't the bootloader check the contents of the application after DFU before rebooting into it?

Has this ever happened to anyone before?

What more steps can we take to diagnose this further?

Please let me know if there's any more information I should provide.

Thank you in advance for your time and help,
- BBFonseca

Parents

0 Sigurd over 2 years ago

Hi,

Are you able to print the pc value?

void app_error_fault_handler(uint32_t id, uint32_t pc, uint32_t info){
    update_error_fault_metrics(id, info);
    NRF_LOG_INFO("****AppErrorFault! id: 0x%x, info: 0x%x, pc: %d \n", id, info,pc);
    display_set_pattern(CHARGING_PATTERN);

If yes, use the pc value to find the line that triggered the fault. Example:

addr2line -e filename.elf 0x23680

0 BFFonseca over 2 years ago in reply to Sigurd

Printing the pc as you instructed gives a 0 value when simulating the Fault with APP_ERROR_CHECK(1).

Would this be expected?

Also, how can I obtain this .elf file? I'm not using Segger, I'm building with the CLI using make.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Sigurd over 2 years ago in reply to BFFonseca

Could you try calling

app_error_print(id, pc, info);

in the beginning of the app_error_fault_handler() function ?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 BFFonseca over 2 years ago in reply to Sigurd

Adding this app_error_print(id, pc, info) is giving me an 0x07 error form the fds_record_write() function on startup.

This is a FDS_ERR_NO_SPACE_IN_FLASH.

Could lack of space in flash explain these seemingly random AppErrors?
I find this weird because they don't always happen (as I said, I've gotten over 20 successful DFUs before this happens).

Do you have any explanation as to why this happens some times and some times it doesn't?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Sigurd over 2 years ago in reply to BFFonseca

Hi,

Looking at the "What are SDK 12.x.0 known issues" list here,

What are SDK 12.x.0 known issues

there is at least 1 issues related to FDS. In this post:

FDS garbage collector, what does it do?

Maybe you can try applying that fix.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 BFFonseca over 2 years ago in reply to Sigurd

I tried applying the fds.c patch provided in the Verified Answer of the FDS garbage collector, what does it do? thread, but I'm getting the following build error:

...
Compiling file: fds.c
../../../../../../components/libraries/fds/fds.c: In function 'pages_init':
../../../../../../components/libraries/fds/fds.c:709:25: error: 'SWAP_EMPTY' undeclared (first use in this function)
  709 |                         SWAP_EMPTY : SWAP_DIRTY;
      |                         ^~~~~~~~~~
../../../../../../components/libraries/fds/fds.c:709:25: note: each undeclared identifier is reported only once for each function it appears in
../../../../../../components/libraries/fds/fds.c:709:38: error: 'SWAP_DIRTY' undeclared (first use in this function); did you mean 'PAGE_SWAP_DIRTY'?
  709 |                         SWAP_EMPTY : SWAP_DIRTY;
      |                                      ^~~~~~~~~~
      |                                      PAGE_SWAP_DIRTY
make[3]: *** [../../../../../../components/toolchain/gcc/Makefile.common:135: _build/nrf51822_xxac_fds.c.o] Error 1
...

Am I supposed to update any more files? Like fds.h or any other header that contains these definitions?

Maybe this patch only works for SDK 12.1 (used by the thread's creator)?

0 Sigurd over 2 years ago in reply to BFFonseca

BFFonseca said:
Maybe this patch only works for SDK 12.1 (used by the thread's creator)?

Looks like it might have been an SDK12.1 issue, and not in SDK 12.3

There have been several fixes in FDS in later SDK versions. You might want to backport the FDS module from a newer version to SDK 12.3 , ref

FDS init with 2 pages labeled as swap

FDS driver issue
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Sigurd over 2 years ago in reply to BFFonseca

BFFonseca said:
Maybe this patch only works for SDK 12.1 (used by the thread's creator)?

Looks like it might have been an SDK12.1 issue, and not in SDK 12.3

There have been several fixes in FDS in later SDK versions. You might want to backport the FDS module from a newer version to SDK 12.3 , ref

FDS init with 2 pages labeled as swap

FDS driver issue
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

No Data