This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

OTA-DFU sometimes erases the application data unexpectedly

Hi,

We are facing an issue where the application data stored using FDS is erased one in a while after OTA-DFU multiple times.

This issue does not occur all the time, but it appears as often as 5% of the times.

We are developing the customized application based on the ble_peripheral/ble_app_uart example with the following environment:

  • Target Chip: nRF52832
  • SDK: v16.0.0
  • Soft Device: s132
  • Compiler: IAR

For the DFU, I used secure_bootloader exmaple and combined these 2 application to crated 1 .zip file by nrfutil to upload via nRF Tools app (either iOS or Android).

I have made sure that the following parameters are matched:

// In nrf_dfu_types.h
DFU_APP_DATA_RESERVED CODE_PAGE_SIZE * 3

// In sdk_config.h
FDS_VIRTUAL_PAGES 3

This problem occurs regardless of the application size to be uploaded, i.e. it doesn't matter if it's single-bank or dual-bank update.

Is there any possible factor this issue to happen only sometimes?

If anyone can suggest the way to resolve or to reproduce the issue with 100% frequency, it would be so helpful!

Thank you.

Parents
  • Hi Terje,

    Thank you very much for your detailed insight.

    To share the information of what I have encountered, let me share what my program was doing:

    (0) Initialize FDS

    (1) Read 60 bytes of data from Flash, where several parameters are stored in sequence

    (2) Check the parameters if they have any invalid values

    (3)-(a) If the values look OK, do nothing and go to (5)

    (3)-(b) If there is any parameter with invalid value, over-write with a default value 

    (4) Delete the flash page where the parameters are stored and over-write with the updated values

    (5) Continue the initialization and the rest of the program

    Before the fix was made, there was a wrong condition in checking the parameters and the program was always going through (3)-(b).

    After fixing the condition, the program normally goes though (3)-(a), apparently reducing the probability of the issue from happening.

    As the ultimate fix, as you have suggested, I should be removing the step (4) to avoid running garbage collection at start-up. I will consider how we can do it...

    Another fact to share regarding this issue was that a simple reset or power cycling didn't trigger this issue. It only happened by the reset after Flash read by nRF Connect Programmer and/or DFU. It may suggest that this issue was related both reset and Flash access.

    About the pattern #2, it remains to be mystery to me. As you suggested, it does look the data is normally stored, but the data couldn't be read by the program... As neither #1 or #2 happens any more after the fix, the factor to the issue may have been the same...

    Regards,

    Keni

Reply
  • Hi Terje,

    Thank you very much for your detailed insight.

    To share the information of what I have encountered, let me share what my program was doing:

    (0) Initialize FDS

    (1) Read 60 bytes of data from Flash, where several parameters are stored in sequence

    (2) Check the parameters if they have any invalid values

    (3)-(a) If the values look OK, do nothing and go to (5)

    (3)-(b) If there is any parameter with invalid value, over-write with a default value 

    (4) Delete the flash page where the parameters are stored and over-write with the updated values

    (5) Continue the initialization and the rest of the program

    Before the fix was made, there was a wrong condition in checking the parameters and the program was always going through (3)-(b).

    After fixing the condition, the program normally goes though (3)-(a), apparently reducing the probability of the issue from happening.

    As the ultimate fix, as you have suggested, I should be removing the step (4) to avoid running garbage collection at start-up. I will consider how we can do it...

    Another fact to share regarding this issue was that a simple reset or power cycling didn't trigger this issue. It only happened by the reset after Flash read by nRF Connect Programmer and/or DFU. It may suggest that this issue was related both reset and Flash access.

    About the pattern #2, it remains to be mystery to me. As you suggested, it does look the data is normally stored, but the data couldn't be read by the program... As neither #1 or #2 happens any more after the fix, the factor to the issue may have been the same...

    Regards,

    Keni

Children
  • Hi,

    I had another look at the pattern 2 case, and the "File ID" field for the record is 0xFFFF instead of 0x1111. In other words, it looks like that field was not written. (Flash is erased into a state of all 0xFF, and then the 0 bits can be written. To FDS, a File ID of 0xFFFF means "not a valid record, and by the way the rest of the flash page is empty." I do not recall to have witnessed this particular error situation before. It may of course be related, but I would like to look a bit more into it before I let it pass.

    Do you, by any chance, use the Power fail comparator? When supply voltage falls below VPOF, NVMC operations (e.g. flash writing) will fail, silently. Flash is written one word at a time, and File ID together with CRC (which is 0xFFFF because it is not used) constitutes one word which is word aligned in flash. If the power fail comparator prevents that particular word from getting written, you would end up in that state.

    Pattern 1, however, I have seen before. The page does start with the "FDS page" marker word, but the rest of the page is completely 0xFF. I do believe we have a patch for that, and I find it a  bit strange that it is still an issue in nRF5 SDK v16. It arises from garbage collection getting interrupted by a reset at a particular point in the process. It should be handled by FDS recognizing the page as an FDS page and treat it the same as if it was completely empty. (That is, on initialization, to promote it to a Swap page if no other Swap page exist, or to a Data page if a Swap page exist already.) I am a bit surprised this was not patched in SDK v16. I will look into that as well.

    Regards,
    Terje

Related