This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

OTA-DFU sometimes erases the application data unexpectedly

Hi,

We are facing an issue where the application data stored using FDS is erased one in a while after OTA-DFU multiple times.

This issue does not occur all the time, but it appears as often as 5% of the times.

We are developing the customized application based on the ble_peripheral/ble_app_uart example with the following environment:

  • Target Chip: nRF52832
  • SDK: v16.0.0
  • Soft Device: s132
  • Compiler: IAR

For the DFU, I used secure_bootloader exmaple and combined these 2 application to crated 1 .zip file by nrfutil to upload via nRF Tools app (either iOS or Android).

I have made sure that the following parameters are matched:

// In nrf_dfu_types.h
DFU_APP_DATA_RESERVED CODE_PAGE_SIZE * 3

// In sdk_config.h
FDS_VIRTUAL_PAGES 3

This problem occurs regardless of the application size to be uploaded, i.e. it doesn't matter if it's single-bank or dual-bank update.

Is there any possible factor this issue to happen only sometimes?

If anyone can suggest the way to resolve or to reproduce the issue with 100% frequency, it would be so helpful!

Thank you.

  • Hi,

    Do you have a flash dump from a device where this has happened? (Use the command "nrfjprog --readcode --readuicr dump.hex" to dump flash contents (including UICR registers) to a hex file named "dump.hex".) It should hopefully give some clues about what has gone wrong in flash. Especially if there has been written entries to all of the FDS flash pages before the update. If you also have the flash dump from before the failing DFU, that could be even better.

    If you use the debug bootloader, which has logging, that log from the update might also give some clues. Please note however that the debug bootloader is larger in size than the non-debug bootloader, so the location of application data will differ.

    What version of the nRF5 SDK are you using?

    Regards,
    Terje

  • Hi Terje,

    Thank you so much for your advice.

    About the SDK, we are using SDK ver16.0.0.

    For the hex dump, I was using nRF Connect's programmer to view the changes of flash before and after the issue occurs.

    As I checked the inside, I could see that the data was completely lost. I also noticed that this issue tend to happen more often when I tried to read the flash memory by using nRF Connect's programmer right after the DFU completed. So it may be the issue of the FDS usage at initialization of the program. 

    Please see below for what I saw from the flash:

    The application is freshly written (Before the issue occurs)
    :020000040007F3
    
    :10500000DEC0ADDEFE011EF122220A0011110000F9
    # @00075000 (16 bytes)
    # {TAG_MAGIC, TAG_DATA, 0x000A2222, 0x00001111}
    
    :105010000200000001000000080000001100000074
    # @00075010 (16 bytes)
    
    :10502000FFFF0000FFFF0F00FFFF0F000200000066
    # @00075020 (16 bytes)
    
    :0C5030000100000054050000000000001A
    # @00075030 (12 bytes)
    
    :08600000DEC0ADDEFF011EF160
    # @00076000 (8 bytes)
    # {TAG_MAGIC, TAG_SWAP}
    
    :08700000DEC0ADDEFE011EF151
    # @00077000 (8 bytes)
    # {TAG_MAGIC, TAG_SWAP}
    
    The application data (0x12345, 0x67890) is stored using FDS 
    :020000040007F3
    
    :085000 00 DEC0ADDE FF011EF1 70
    # @00075000 (8 bytes)
    # {TAG_MAGIC, TAG_SWAP}
    
    :10600000DEC0ADDEFE011EF122220A0011110000 E9
    # @00076000 (16 bytes)
    # {TAG_MAGIC, TAG_DATA, 0x000A2222, 0x00001111}
    
    :106010000500000001000000080000001100000061
    # @00076010 (16 bytes)
    # {0x00000005, 0x00000001, 0x00000008, 0x00000011}
    
    :10602000FFFF0000452301009078060002000000F9
    # @00076020 (16 bytes)
    # {0x0000FFFF, 0x00012345, 0x00067890, 0x00000002}
    #              ~~~~~~~~~~  ~~~~~~~~~~
    
    :0C6030000100000054050000000000000A
    # @00076030 (8 bytes)
    # {0x00000001, 0x00000554, 0x00000000}
    
    :08700000DEC0ADDEFE011EF151
    # @00077000 (8 bytes)
    # {TAG_MAGIC, TAG_DATA}
    
    
    The issue occurred (Pattern 1)
    :020000040007F3
    
    :08500000DEC0ADDEFE011EF171
    # @00075000 (8 bytes)
    # {TAG_MAGIC, TAG_DATA}
    
    :04600000DEC0ADDE73
    # @00076000 (4 bytes)
    # {TAG_MAGIC}
    
    :08700000DEC0ADDEFE011EF151
    # @00077000 (8 bytes)
    # {TAG_MAGIC, TAG_DATA}
    
    
    The issue occurred (Pattern 2)
    :020000040007F3
    
    :10500000DEC0ADDEFE011EF122220A00FFFFFFFF1F
    # @00075000 (16 bytes)
    
    :10501000080000000100000008000000110000006E
    # @00075010 (16 bytes)
    
    :10502000FFFF000045230100907806000200000009
    # @00075020 (16 bytes)
    
    :0C5030000100000054050000000000001A
    # @00075030 (16 bytes)
    
    :08600000DEC0ADDEFF011EF160
    # @00076000 (8 bytes)
    # {TAG_MAGIC, TAG_SWAP}
    
    :08700000DEC0ADDEFE011EF151
    # @00077000 (8 bytes)
    # {TAG_MAGIC, TAG_DATA}

    Any suggestion/advice would be helpful...!

    Regards,

    Keni

  • Update from my investigation on this issue.

    I have checked the error returned from fds_record_find() at the initialization process and found out that it was returning FDS_ERR_NOT_FOUND error.

    Also I found out that there was a bug in my initialization process where FDS delete and write accesses are made every time my application is initialized.

    After removing this bug, the issue doesn't seem to occur any more as long as I tried so far.

    Do you think this was the cause? If so, what could be causing the FDS_ERR_NOT_FOUND error and why was it happening only once in a few times?

    Since I'm not yet 100% sure if the problem is really resolved, I would like to know what was going on in detail to nail down the real cause.

  • Hi,

    Without knowing the details of the bug that you fixed, I cannot tell if it could be the cause or not. It is not unreasonable that fds_record_find() returns FDS_ERR_NOT_FOUND if the record that you search for gets deleted and rewritten close in time to that. Permanent loss of records _might_ be explained by erasing/rewriting records, if for instance resets occurs so that records are deleted, then reset happens before the record is written again.

    There has been some other reports of strange behavior after DFU after initial programming was done with the nRF Connect Programmer app. It does seem to be related to resets. From what I understand, when programming one or multiple hex files using the nRF Connect Programmer app, there may be one or more resets of the device as part of that process, which means the device may run briefly in the middle of being programmed. If that turns out to be the case, then that could be related to what you have been experiencing, in combination with erasing/writing records on startup.

    We are currently looking into the Programmer app issue. For now I recommend using nrfjprog, part of nRF Command Line Tools, for programming the device, and especially so if using DFU.

    Regards,
    Terje

  • Hi,

    I had a more thorough look through the lines from the hex dump, and I see for instance that in one of the cases (pattern 1) there are two FDS data pages, but no FDS swap page. We have experienced some issues with FDS related to resets during garbage collection, although known issues should mostly have been fixed for SDK v16. In any case, I think this points in the direction of reset related issues.

    Running garbage collection on startup (e.g. in initialization) is not a good idea, as resets often occur in close succession (e.g. battery powered with low power on battery leading to frequent brownouts, during automatic programming / testing situations, etc.) As garbage collection typically mean additional flash read/erase cycles, this contributes to flash wear. While garbage collection is designed not to break on reset, we did have some edge cases prior to SDK 16 where reset at a particular clock cycle during garbage collection would leave the device in a state similar to your "pattern 1" situation.

    For "pattern 2", the first piece of data is {TAG_MAGIC, TAG_DATA}, no? Followed by the beginning of a record?

    :10500000 DEC0ADDE FE011EF1 22220A00FFFFFFFF1F
    # @00075000 (16 bytes)

    Regards,
    Terje

Related