Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

FDS - data corruption

Hi,

I found a bug in fds module (Flash Data Storage) which lead to data system corruption.

Scenario:

  • Given the device started GC(Garbage Collection) procedure

  • When the device will reset during the in the certain moment of page swap procedure.

  • Then assigning new record id will overlap existing ones.

When just first two words of a record are copied to the swap page, the header of this record will pass header_check function.

During the page_scan procedure, this corrupted header will be used to update m_latest_rec_id. It will result in assigning new record ids from 0.

Solution:

I have updated header_check function to check record_id against 0xFFFFFFFF value, and now I am testing this workaround.

Parents
  • Hi,

    Which SDK version are you using, and what IC are you experiencing this bug on?

    Have you reproduced this on any of oru DKs, and can you share code/procedure that can be used to reproduce this bug?

    Best regards,
    Jørgen

  • We are using:

    • SDK 14.2, but it should be reproducible with SDK 15.2 (code analysis). 
    • NRF52832 and PCA10040 - but this is purely software bug.

    As I described above:

    • GC procedure must be triggered (for example by a lot of write requests to fds)
    • This procedure must be interrupted with reset in a certain moment (power off/system reset)
    • After power on next write, requests will overwrite existing (first will have record id 0 so the second one will corrupt existing records).

    It is hard to reproduce. In our setup, it takes about 12 h of certain tests. I believe that code analysis of page_scan and header_check function will help.

    I can also share with you a flash dump from the corrupted fds storage.

Reply
  • We are using:

    • SDK 14.2, but it should be reproducible with SDK 15.2 (code analysis). 
    • NRF52832 and PCA10040 - but this is purely software bug.

    As I described above:

    • GC procedure must be triggered (for example by a lot of write requests to fds)
    • This procedure must be interrupted with reset in a certain moment (power off/system reset)
    • After power on next write, requests will overwrite existing (first will have record id 0 so the second one will corrupt existing records).

    It is hard to reproduce. In our setup, it takes about 12 h of certain tests. I believe that code analysis of page_scan and header_check function will help.

    I can also share with you a flash dump from the corrupted fds storage.

Children
Related