NO_SWAP returned from pages_init() causes fatal error

Hello, 

My device is failing to initialize because of an error with fds_init(). I am using SDK 14.2 with NRF52840.

It seems like the same problem as here, except no answer was ultimately provided. Basically, I have a device that returns FDS_PAGE_DATA for all pages inside pages_init(). This turns into NO_SWAP in fds_init(), which turns into NRF_STORAGE_FULL in pds_init(), and ultimately causes the initialization to fail. 

This problem is similar, but I am not trying to do DFU at the moment, so I don't think the solution is the same. I am afraid of erasing the data because I don't know if I need it. My understanding is that the FDS holds the application and its data... so if I erase it, won't I be erasing my application and render my device useless? Please correct me if I misunderstand. 

I have been solving the problem temporarily by incrementing FDS pages, but I cannot keep increasing the number of pages forever, and this problem only happens with a couple of my devices, the others are fine. Most devices are fine with just 3 pages. The broken one needs 6. 

1. How can I find out what is causing my data pages to fill up, and how can I prevent this? 

2. What is the limit on the number of virtual pages you can have, and what is the point of allocating a smaller number than the maximum possible number of pages? 

Thanks!

  • Hello,

    The issue is not that your data pages are filling up. The issue is that there is a bug in FDS that some times, when you loose the power at the wrong point in time can corrupt the FDS pages, causing you to be left with only data pages, and no swap pages.

    The solution consists of two things:

    1: Do not (!) run fds_gc() on startup. This should only be called when you get the event NRF_STORAGE_FULL. What we see is that some customers put fds_gc() as part of their init procedure. This will beat up the flash (use up the flash erase/write cycles), and increase the risk of this bug being present.

    2: Port either your whole application, or at least the FDS module to the latest SDK (currently SDK17.0.2). There is at least a few known issues that can lead to this bug surfacing.

     

    My understanding is that the FDS holds the application and its data... so if I erase it, won't I be erasing my application and render my device useless? Please correct me if I misunderstand. 

     This is not correct. FDS is the flash area used for the application to store it's custom data. The flash in the nRF Starts at 0x00000000 with the softdevice (if used). After the softdevice is your application. The FDS pages are near the top of the flash. If you have a bootloader, this is located at the very top of the flash (0x00100 0000 and down). The FDS is placed directly below this.

    FDS can be used by different parts of your application. If you support BLE bonding, the peer manager will use the FDS pages to store the bonding keys. Your application may also store some custom data, such as configuration settings. If you are unsure whether or not you have implemented this, you probably have not. Search for fds_record_write() in your application to find out what is using it. 

    So as long as your application is not requiring any data in the FDS it should be safe to erase. When the nRF was programmed the first time, it was probably with an empty FDS flash area, unless something was manually programmed into this address.

    Best regards,

    Edvin

  • Thank you Edvin. 

    1. I only call fds_gc in the case of PM_EVT_STORAGE_FULL and never otherwise. 

    2. Do you think the bug fixes included in the files linked here (by Peter Myrre) will stop the NO_SWAP bug, without needing to upgrade to 17.0 now? https://devzone.nordicsemi.com/f/nordic-q-a/29926/bug-in-peer-manager-in-sdk-14-2

    Before I integrated this bugfix, I would sometimes get a cascade of back to back "garbage collected" events, such that it practically bricked the device. It seemed random, and I could never figure out a specific cause, but I haven't gotten such a cascade since I added the bug fix. 

    The device for which the FDS_VIRTUAL_PAGES needed to be incremented to 6 had its issues before I added this bugfix. It is still left with extra data pages, but I am wondering if maybe I don't have to worry about having to increment FDS_VIRTUAL_PAGES anymore, i.e., do you think that one bug fix is enough? Or does 17.0 have other bugfixes that are not included in Peter Myrre's files? 

    If you support BLE bonding, the peer manager will use the FDS pages to store the bonding keys. Your application may also store some custom data, such as configuration settings. If you are unsure whether or not you have implemented this, you probably have not. Search for fds_record_write() in your application to find out what is using it. 

    So as long as your application is not requiring any data in the FDS it should be safe to erase.

    I did not find fds_record_write() anywhere in my application except the SDK:

    ...so the only effect of clearing FDS is that my bond information will be deleted? If my central (phone) only requests to pair once (upon first connection), but not upon each subsequent connection, does that mean I am bonding and not just pairing? (Based on the definition I read here.)

    And if I clear FDS, does that mean the next time I try connect to my device with the phone, it will ask me to pair again, because the encryption keys will have been deleted? If this is the case, then it does not seem like a big problem to clear FDS. 

  • nordev said:
    do you think that one bug fix is enough? Or does 17.0 have other bugfixes that are not included in Peter Myrre's files? 

     There are later bugfixes than the one that Petter mentions in that post, yes.

    The one that I am thinking of is applied to SDK16.0.0 (but if you port to the SDK16 version, you may as well port to the SDK17.0.2 version. They are practically the same).

    Mentioned in the release notes for SDK 16.0.0:

    - FDS: fixed two bugs where a power loss at very specific times during garbage
      collection could corrupt the file system, making FDS unable to initialize and return
      FDS_ERR_NO_PAGES on initialization.
    - FDS: fixed a bug that prevented using the last word of a flash page to save a record.

     

    nordev said:
    I did not find fds_record_write() anywhere in my application except the SDK:

     So then it looks like it is your peer manager that uses the FDS to store bonding data for BLE.

     

    nordev said:
    And if I clear FDS, does that mean the next time I try connect to my device with the phone, it will ask me to pair again, because the encryption keys will have been deleted? If this is the case, then it does not seem like a big problem to clear FDS. 

     That is correct. You may need to delete the bond information on the phone as well. Go into your bluetooth settings, find the nRF device and click something like "forget device" (depending on the OS).

    Best regards,

    Edvin

Related