I found a bug in fds module (Flash Data Storage) which lead to data system corruption.
Given the device started GC(Garbage Collection) procedure
When the device will reset during the in the certain moment of page swap procedure.
Then assigning new record id will overlap existing ones.
When just first two words of a record are copied to the swap page, the header of this record will pass header_check function.
During the page_scan procedure, this corrupted header will be used to update m_latest_rec_id. It will result in assigning new record ids from 0.
I have updated header_check function to check record_id against 0xFFFFFFFF value, and now I am testing this workaround.
Which SDK version are you using, and what IC are you experiencing this bug on?
Have you reproduced this on any of oru DKs, and can you share code/procedure that can be used to reproduce this bug?
We are using:
As I described above:
It is hard to reproduce. In our setup, it takes about 12 h of certain tests. I believe that code analysis of page_scan and header_check function will help.
I can also share with you a flash dump from the corrupted fds storage.
I am uploading two FDS section dumps. They are binary files.
First one is swap_interrupted (https://drive.google.com/file/d/1YbhHWHZhjc4LcS1HGaIfJf2rJdcdAXHy/view?usp=sharing). It contains data in swap page, which reproduces this issue. If we initialize fds with this data m_latest_rec_id will be set to 0xFFFFFFFF and next created record will have id equal to 0.
The second one is corrupted_data (https://drive.google.com/file/d/1cMJeldRmiGFEwtxTkQgx_KWeFwsv1f-Q/view?usp=sharing). This is how fds looks like, after some operation with m_latest_rec_id initialized to 0xFFFFFFFF.
Were you able to fix this issue? We are seeing something that sounds very similar when doing a lot of FDS writes.
Yes, I have fixed 2 issues.
/**@brief Not initialized record id.
#define FDS_RECORD_ID_NOT_INIT (0xFFFFFFFF)
fds_header_status_t fds_header_check(fds_header_t const * p_hdr, uint32_t const * p_page_end)
if (((uint32_t*)header_jump(p_hdr) > p_page_end))
// The length field would jump across the page boundary.
// FDS won't allow writing such a header, therefore it has been corrupted.
if ( (p_hdr->file_id == FDS_FILE_ID_INVALID)
|| (p_hdr->record_key == FDS_RECORD_KEY_DIRTY)
|| (p_hdr->record_id == FDS_RECORD_ID_NOT_INIT))
I am still struggling with one more. The one with losing swap page after power failure.