This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Problem with Flash Data Storage (FDS)

Hi DevZone !

I have a problem with the FDS library. I am using it to store configurations of my nRF52840 that I read every time the chip resets I will write in that only few time when I want to change those configurations. The problem is that once in a while, when a reset occur (because of an electric transient or a watchdog), the chio (nRF52840) goes to reset mode. I debugged it and it appears to reset while entering in the fds_record_find so I suspect a corruption of data. I am using the following SDKs:

nRF5_SDK_15.0.0_prf

nRF5_SDK_for_Thread_and_Zigbee_2.0.0_29775ac_min

I would appreciate if you have some clues on how to resolve this issue.

Thank you, 

Joel

Parents
  • Hi,

    You write that you get a reset when entering fds_record_find(). Can you confirm it is on entering, and not when checking the return value?

    For instance, you get FDS_ERR_NOT_INITIALIZED if FDS is not yet initialized (for instance if something went wrong during initialization e.g. due to flash corruptions.) The scenario of no matching records should also be handled specifically (return value of FDS_ERR_NOT_FOUND.)

    Running a debug build should lead you to an infinite loop instead of a reset, making it possible to see where (and with what error) it fails. (There might be slight variations on what you need to do in the Thread and Zigbee SDK for getting this behavior.) Do you have the exact code location and/or error code for the fail? It may for instance be a fail in the SoftDevice or another interrupt context.

    In order to investigate the FDS page contents, you can take a hex dump of the device. Connect with a debugger and run
    nrfjprog --readcode --readuicr dump.hex
    on the command line, for dumping flash contents to a file named dump.hex. If you attach that file here I can have a look to see if it is corrupted (and if so in what way it is corrupted.) If you want to send it confidentially then please create a private ticket and attach it there, as well as refer to this thread.

    Regards,
    Terje

  • Hi Terje, 

    I have investigated the issue further more yesterday and I found that calling the function fds_gc() is actually resolving the problem. So, I think it is a problem of flash corruption. However, I'm asking myself (and you), if it will resolve the issue forever, or I will get this problem back again. I would like to resolve the corruption of data at all so it dowsn't occur again.

    Then, I did the debugging again and the problem is not caused by fds_init(), since it returns error code 0 which I assume is NRF_SUCCESS. However, after passing into fds_record_find(), the code returned is 10 which I found was the NRF_ERROR_INVALID_FLAG code. Then, an APP_ERROR_CHECK is resetting the program. Actually, my code didn't do the the APP_ERROR_CHECK  after fds_record_find() so it was trying to write the configurations as the fds_record_find() was not successful and then it was doing the APP_ERROR_CHECK, doing the reset. 

    Finally I have done the dump.hex and I did one dump_after_gc.hex so you could compare both. I prefer to give it to you in private so I will do another private post to send it to you if that's.

    Thank you very much for your time, it is appreciated and very helpful :)

    Joel

Reply
  • Hi Terje, 

    I have investigated the issue further more yesterday and I found that calling the function fds_gc() is actually resolving the problem. So, I think it is a problem of flash corruption. However, I'm asking myself (and you), if it will resolve the issue forever, or I will get this problem back again. I would like to resolve the corruption of data at all so it dowsn't occur again.

    Then, I did the debugging again and the problem is not caused by fds_init(), since it returns error code 0 which I assume is NRF_SUCCESS. However, after passing into fds_record_find(), the code returned is 10 which I found was the NRF_ERROR_INVALID_FLAG code. Then, an APP_ERROR_CHECK is resetting the program. Actually, my code didn't do the the APP_ERROR_CHECK  after fds_record_find() so it was trying to write the configurations as the fds_record_find() was not successful and then it was doing the APP_ERROR_CHECK, doing the reset. 

    Finally I have done the dump.hex and I did one dump_after_gc.hex so you could compare both. I prefer to give it to you in private so I will do another private post to send it to you if that's.

    Thank you very much for your time, it is appreciated and very helpful :)

    Joel

Children
  • Hi,

    In nRF5 SDK 15 the FDS error codes and the NRF error codes overlap, so error 10 from an FDS API function is FDS_ERR_NOT_FOUND.

    This is in accordance with dump.hex, which does not contain a valid record. (There is a swap page, and two pages full of deleted records.)

    In the file dump_after_gc.hex there is one swap page, one empty data page, and one data page with one valid record (and no deleted records). I would expect fds_record_find() to return NRF_SUCCESS (value 0, as you suggest,) if this is the flash contents.

    I suspect that the record is present in flash after GC because it has been written by other code, and not due to GC somehow reviving a record. For instance by GC changing timing and order of other FDS related API calls, or by the code changes resulting in the original error situation not happening in the first place.

    For handling the issue, I suggest that you check for return value of FDS_ERR_NOT_FOUND from fds_record_find(), and in that case handle the situation appropriately. (Either by creating a new record, or by otherwise handling the fact that there is no record to be found.) Only in the case when the return value is not FDS_ERR_NOT_FOUND you should call APP_ERROR_CHECK.

    There are some oddities regarding the records, although I am not sure if they have any relevance to the issue. Let's discuss those in the private thread.

    Regards,
    Terje

Related