This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

OTA-DFU sometimes erases the application data unexpectedly

Hi,

We are facing an issue where the application data stored using FDS is erased one in a while after OTA-DFU multiple times.

This issue does not occur all the time, but it appears as often as 5% of the times.

We are developing the customized application based on the ble_peripheral/ble_app_uart example with the following environment:

  • Target Chip: nRF52832
  • SDK: v16.0.0
  • Soft Device: s132
  • Compiler: IAR

For the DFU, I used secure_bootloader exmaple and combined these 2 application to crated 1 .zip file by nrfutil to upload via nRF Tools app (either iOS or Android).

I have made sure that the following parameters are matched:

// In nrf_dfu_types.h
DFU_APP_DATA_RESERVED CODE_PAGE_SIZE * 3

// In sdk_config.h
FDS_VIRTUAL_PAGES 3

This problem occurs regardless of the application size to be uploaded, i.e. it doesn't matter if it's single-bank or dual-bank update.

Is there any possible factor this issue to happen only sometimes?

If anyone can suggest the way to resolve or to reproduce the issue with 100% frequency, it would be so helpful!

Thank you.

  • Hi Terje,

    Thank you very much for your detailed insight.

    To share the information of what I have encountered, let me share what my program was doing:

    (0) Initialize FDS

    (1) Read 60 bytes of data from Flash, where several parameters are stored in sequence

    (2) Check the parameters if they have any invalid values

    (3)-(a) If the values look OK, do nothing and go to (5)

    (3)-(b) If there is any parameter with invalid value, over-write with a default value 

    (4) Delete the flash page where the parameters are stored and over-write with the updated values

    (5) Continue the initialization and the rest of the program

    Before the fix was made, there was a wrong condition in checking the parameters and the program was always going through (3)-(b).

    After fixing the condition, the program normally goes though (3)-(a), apparently reducing the probability of the issue from happening.

    As the ultimate fix, as you have suggested, I should be removing the step (4) to avoid running garbage collection at start-up. I will consider how we can do it...

    Another fact to share regarding this issue was that a simple reset or power cycling didn't trigger this issue. It only happened by the reset after Flash read by nRF Connect Programmer and/or DFU. It may suggest that this issue was related both reset and Flash access.

    About the pattern #2, it remains to be mystery to me. As you suggested, it does look the data is normally stored, but the data couldn't be read by the program... As neither #1 or #2 happens any more after the fix, the factor to the issue may have been the same...

    Regards,

    Keni

  • Hi,

    I had another look at the pattern 2 case, and the "File ID" field for the record is 0xFFFF instead of 0x1111. In other words, it looks like that field was not written. (Flash is erased into a state of all 0xFF, and then the 0 bits can be written. To FDS, a File ID of 0xFFFF means "not a valid record, and by the way the rest of the flash page is empty." I do not recall to have witnessed this particular error situation before. It may of course be related, but I would like to look a bit more into it before I let it pass.

    Do you, by any chance, use the Power fail comparator? When supply voltage falls below VPOF, NVMC operations (e.g. flash writing) will fail, silently. Flash is written one word at a time, and File ID together with CRC (which is 0xFFFF because it is not used) constitutes one word which is word aligned in flash. If the power fail comparator prevents that particular word from getting written, you would end up in that state.

    Pattern 1, however, I have seen before. The page does start with the "FDS page" marker word, but the rest of the page is completely 0xFF. I do believe we have a patch for that, and I find it a  bit strange that it is still an issue in nRF5 SDK v16. It arises from garbage collection getting interrupted by a reset at a particular point in the process. It should be handled by FDS recognizing the page as an FDS page and treat it the same as if it was completely empty. (That is, on initialization, to promote it to a Swap page if no other Swap page exist, or to a Data page if a Swap page exist already.) I am a bit surprised this was not patched in SDK v16. I will look into that as well.

    Regards,
    Terje

  • Hi Terje,

    Sorry about the delay.

    About your question regarding the power fail comparator, I asked my PCB designer and got the following answer:

    We are not using any feature for the abnormal power detection.

    The module is providing nRF52832 with 2.1V power generated from 3V battery by a linear regulator.

    When this issue occurred the 2.1V power supply as well as 3V battery was maintained and so as during OTA-DFU. 

    It was observed that the supply power was never below VPOF (1.7V).

    I hope this answers your question.

    Thank you.

  • Hi,

    Sorry for the very long delay from my side.

    Yes, it answered the question, and it means the issues that you see are more likely to stem from unidentified wrongly handled corner cases in FDS.

    Since my last reply, we have identified a potential issue that may leave the FDS pages without any swap page. It may be related, and I'll share the fix once we have a tested and verified solution.

    After having reread the previous entries in this thread, I noticed the following step:

    keni3 said:
    (4) Delete the flash page where the parameters are stored and over-write with the updated values

    Does that mean you delete the flash page through other means than deleting the records through the FDS API? Deleting FDS flash pages manually may leave FDS in an invalid state. If you used FDS to store records, you should use the proper FDS API calls to delete or update those records as well. You should not write to the FDS flash pages in any other way than through using the FDS API, or else there is no guarantee to data integrity.

    Alternatively, if it means this is a separate flash page from what is used for FDS, and both writing and erasing happens directly (not through FDS), then any issues there should not be related to FDS.

    Regards,
    Terje

  • Hi Terje,

    Thank you for your update.

    I will wait for the fix you are working on! Glad to hear that there will be a cure to this issue.

    About your question, I DO use the FDS API to whenever accessing the flash.

    What I wrote may have been misleading because I used the word "flash page" instead of "flash record".

    More specifically, I use fds_record_delete() to delete and clean the record and then use fds_record_write() to re-write the updated parameters. When deleting, fds_record_find() is used with the same FILE_ID and REC_KEY as set in the descriptor for fds_record_write().

    Regards,

    Keni

Related