This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Older DFU application valid issue

Hello,

There have been some questions on the forum regarding the DFU and it's zero value for the application CRC. However, none of them seem to address this question so I thought I would float it out here and see if anyone else has encountered this or even has a solution.

The product I am writing about is running an on an older SDK (5.2.0!) but it's been working and we haven't needed to upgrade it. Recently we had a problem come up with the application section of the FW on the nRF51822. After creating a fix for the issue, we had 400 units in inventory that needed to be upgraded prior to shipping. Rather than tear them apart to get at the programming connector, we decided to OTA bootload the new application FW into the devices. After it was all said and done 4 of the 400 units were "bricked".

The folks in production gave me one of the units to see if I could determine what happened. Note that we are using a dual bank DFU based on the SDK mentioned above. The only modifications I made to the example bootloader was to remove the LED and button handling, and replace it with a flag in the GPREGRET register to enter bootload mode from the application.

What I found was that the application bank was erased (all 0xFF), there was a valid binary image in the second bank. However in the applications settings page at the end of memory the application was marked as valid with a CRC of 0 (just like it would look when the whole image is programmed in the factory), and the second bank was marked as erased.

After examining the bootloader init code I determined that every time we reset, the bootloader would take control see the app valid flag and zero CRC and jump to the erased application resulting in a hard fault. The device was indeed bricked.

I did some more digging and I think I found the problem. In the activate function of dfu_dual_b ank.c there is the following code that erases the application and writes the new binary in its place:

            // Stop the DFU Timer because the peer activity need not be monitored any longer.
        err_code = app_timer_stop(m_dfu_timer_id);
        APP_ERROR_CHECK(err_code);
        
        // Erase BANK 0.
        err_code = pstorage_raw_clear(&m_storage_handle_app, m_image_size);
        APP_ERROR_CHECK(err_code);

If the second pstorage operation fails and returns a non NRF_SUCCESS result, the assert in the call back handler will reset and leave the device in this state. If the code added a step to update the application status flag to erased, that would prevent this problem from happening.

  1. Is this a valid hypothesis?
  2. Does anyone have a cute way of putting the application CRC into the hex file as part of building a complete image (Soft Device, application, bootloader, application valid all in the same hex file)?

Right now I do get the application valid flag into the complete hex file, but I don't have a mechanism to add the CRC to it. If I can add the CRC of the application, it would also prevent this problem.

Related