BLE Client unable to delete peer pairing data

Hi. This is a problem that occurs on a customer's factory test client unit. Issue occurs roughly about once every few months.

Customer Development Environment Info:

OS: Windows

SDK: nRF5 SDK 15.3.0

During production process, DUT is connected by client and pairing is performed. Tests were executed to verify DUT functionality. Then, client calls pm_peers_delete() after disconnecting with DUT so it can connect to a new unit.

When the problem occurs, the client becomes unable to connect to a new DUT. Rebooting the BLE client will allow it connect to a new DUT only once, then it gets stuck again.

Problem is temporary fixed after re-programming the client module. It works for several months, then the problem occurs again.

I'm suspecting this is a flash endurance where re-flashing temporary fixed the issue because a new, working flash page was then used for data storage. I was wondering if anyone can confirm that this is a possible cause of the problem. Or there may be other possibilities? Thanks.

Parents
  • Hello,

    It sounds like the flash is filling up. The peer manager (the module that handles bonding information, using e.g. pm_peers_delete()) uses something called FDS, which is a mini-file system. Particularly, check out the descriptions on deleting records and garbage collection found here:

    https://docs.nordicsemi.com/bundle/sdk_nrf5_v17.1.0/page/lib_fds_functionality.html

    So what happens is probably that they keep filling up with new bonds, and hence new FDS records. They do delete old records, but deleting a record in FDS doesn't remove it from the flash. It will only mark them as "not valid". It is first when you run a garbage collection (using fds_gc()) that it will go through the records and delete the records that are no longer valid. That is a bit of a simplified, but it is more or less how it is done.

    Now you may be thinking: "I'll just do a garbage collection every time I delete a record", or "I'll do a garbage collection on every bootup", but please don't. If you do it too often you will wear out the flash. If you do it on every bootup, you risk ending up with a corrupted flash. We have seen occasions of it, and without knowing the exact cause, it is related to being stuck in a reboot cycle starting a garbage collection, but not finishing it. The FDS is quite robust, so it can easily handle a power off/brownout at any time, but occasionally, when it is interrupted several times in a row due to a almost dead battery, it may end up in a state where it can't recover without re-flashing. 

    So the proper way to handle this is to do it when the FDS pages are full, like you are seeing now. It is also possible to do it when it is almost full. There is an API, fds_stat() that can be used to see the number of invalid records, called "dirty records". So for example if the application calls pm_peers_delete() regularly, you can, after this, use fds_stat() to check if the number of dirty records is very high, or alternatively if largest_contig is small enough to run an fds_gc() (garbage collection). This way, you will not wear out the flash, and the application will always have enough space in FDS to save new peer data. 

    Best regards,

    Edvin

Reply
  • Hello,

    It sounds like the flash is filling up. The peer manager (the module that handles bonding information, using e.g. pm_peers_delete()) uses something called FDS, which is a mini-file system. Particularly, check out the descriptions on deleting records and garbage collection found here:

    https://docs.nordicsemi.com/bundle/sdk_nrf5_v17.1.0/page/lib_fds_functionality.html

    So what happens is probably that they keep filling up with new bonds, and hence new FDS records. They do delete old records, but deleting a record in FDS doesn't remove it from the flash. It will only mark them as "not valid". It is first when you run a garbage collection (using fds_gc()) that it will go through the records and delete the records that are no longer valid. That is a bit of a simplified, but it is more or less how it is done.

    Now you may be thinking: "I'll just do a garbage collection every time I delete a record", or "I'll do a garbage collection on every bootup", but please don't. If you do it too often you will wear out the flash. If you do it on every bootup, you risk ending up with a corrupted flash. We have seen occasions of it, and without knowing the exact cause, it is related to being stuck in a reboot cycle starting a garbage collection, but not finishing it. The FDS is quite robust, so it can easily handle a power off/brownout at any time, but occasionally, when it is interrupted several times in a row due to a almost dead battery, it may end up in a state where it can't recover without re-flashing. 

    So the proper way to handle this is to do it when the FDS pages are full, like you are seeing now. It is also possible to do it when it is almost full. There is an API, fds_stat() that can be used to see the number of invalid records, called "dirty records". So for example if the application calls pm_peers_delete() regularly, you can, after this, use fds_stat() to check if the number of dirty records is very high, or alternatively if largest_contig is small enough to run an fds_gc() (garbage collection). This way, you will not wear out the flash, and the application will always have enough space in FDS to save new peer data. 

    Best regards,

    Edvin

Children
No Data
Related