This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Peer Manager deletes only peer while we still need it -- failed flash write race?

We are using nrf52840 as a peripheral, Nordic SDK 15.3.0_59ac345.

We are experiencing issues in the field where users are unable to reconnect with our device from our app on iOS phones. App logs indicate that link encryption is failing because Status: PIN or Key Missing (0x06). What follows is my interpretation of what's going on and a potential solution. Looking for feedback from y'all about the problem statement and any better solutions that may exist.

The garbage collection function, pm_handler_flash_clean (which we run from our register PM event handler), keeps a local flag to track that the next flash write succeeded after a garbage collection occurs. If it detects that a flash write after garbage collection fails it assumes that more drastic measures are needed and will mark the oldest peer for deletion and rerun garbage collection, which will result in bond information deletion for the selected peer. What I see happening is a race between multiple instances of flash writes failing because the flash is full (which can be detected synchronously, as apposed to actual flash write attempts).
  • flash write A starts and fails for flash full
  • GC A starts, sets flash_write_after_gc = false , successful write after GC will set back to true
  • flash write B starts and fails for flash full
  • GC B detects that flash_write_after_gc == false and marks the oldest peer for deletion and then starts
  • GC A completes
  • GC B completes, peer bond information has been deleted
In our case the oldest peer is the only peer and we fail to encrypt the link on the central with Status: PIN or Key Missing (0x06).
Here are pm some logs to demonstrate

00> <debug> peer_manager_handler: Event PM_EVT_STORAGE_FULL
00> <warning> peer_manager_handler: Flash storage is full
00> <info> peer_manager_handler: Attempting to clean flash.
00> <debug> peer_manager_handler: Running flash garbage collection.
00> <debug> peer_manager_handler: Event PM_EVT_STORAGE_FULL
00> <warning> peer_manager_handler: Flash storage is full
00> <warning> peer_manager_gcm: Flash full. Could not store data for conn_handle: 0
00> <debug> peer_manager_handler: Event PM_EVT_STORAGE_FULL
00> <warning> peer_manager_handler: Flash storage is full
00> <debug> peer_manager_handler: Event PM_EVT_STORAGE_FULL
00> <warning> peer_manager_handler: Flash storage is full
00> <debug> peer_manager_handler: Event PM_EVT_FLASH_GARBAGE_COLLECTED
00> <debug> peer_manager_handler: Flash garbage collection complete.
00> <debug> peer_manager_handler: Event PM_EVT_FLASH_GARBAGE_COLLECTED
00> <debug> peer_manager_handler: Flash garbage collection complete.
00> <debug> peer_manager_handler: Event PM_EVT_STORAGE_FULL
00> <warning> peer_manager_handler: Flash storage is full
00> <info> peer_manager_handler: Attempting to clean flash.
00> <info> peer_manager_handler: Deleting lowest ranked peer (peer_id: 0)
00> <debug> peer_manager_handler: Event PM_EVT_STORAGE_FULL
00> <warning> peer_manager_handler: Flash storage is full
00> <debug> peer_manager_handler: Event PM_EVT_PEER_DATA_UPDATE_SUCCEEDED
00> <debug> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
00> <debug> peer_manager_handler: Event PM_EVT_PEER_DATA_UPDATE_SUCCEEDED
00> <debug> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
00> <debug> peer_manager_handler: Event PM_EVT_STORAGE_FULL
00> <warning> peer_manager_handler: Flash storage is full
00> <info> peer_manager_handler: Attempting to clean flash.
00> <debug> peer_manager_handler: Running flash garbage collection.
00> <debug> peer_manager_handler: Event PM_EVT_STORAGE_FULL
00> <warning> peer_manager_handler: Flash storage is full
00> <debug> peer_manager_handler: Event PM_EVT_PEER_DELETE_SUCCEEDED
00> <error> peer_manager_handler: Peer deleted successfully: peer_id: 0
00> <debug> peer_manager_handler: Event PM_EVT_PEER_DELETE_SUCCEEDED
00> <error> peer_manager_handler: Peer deleted successfully: peer_id: 0
If I understand the code and sequence of events correctly, garbage collection without peer deletion would actually be sufficient. My current plan is to disable peer ranking in our SDK configuration so that the peer manager will never auto delete any peers. We have internal handling that will mark peers for deletion when they are no longer needed.
Any suggestions or corrections based on the summary above?
Parents
  • Hi,

    If I understand the code and sequence of events correctly, garbage collection without peer deletion would actually be sufficient.

    That is the case if you have deleted or updated some records without doing garbage collection before. If so, garbage collection will free the old dirty records, so that you get available space. If not, and you actually have too much data stored (typically also from other FDS data, not related to the peer manger), it will not be enough.

    There are a few things to note here, some of which are related to a few things not handled very well in the peer manager. First of all, you cannot delete the bond for a peer that is currently being used. That will lead to undefined behavior in the peer manager. However, the peer manager does not prevent deleting the bond for a peer that is currently connected. Secondly, as you note, the peer manger PM_EVT_STORAGE_FULL will trigger deletion of the oldest peer even if it is the only peer.

    My current plan is to disable peer ranking in our SDK configuration so that the peer manager will never auto delete any peers. We have internal handling that will mark peers for deletion when they are no longer needed.
    Any suggestions or corrections based on the summary above?

    That makes sense. Either that, or improve the peer manager handling of this scenario. Altos, another approach sometimes used is to regularly check the amount of free space in FDS, and do a garbage collection if there is a need for that (there is memory to be freed, and amount of free space is limited). Potentially also delete old and no longer needed data before doing the GC. 

    Einar

Reply
  • Hi,

    If I understand the code and sequence of events correctly, garbage collection without peer deletion would actually be sufficient.

    That is the case if you have deleted or updated some records without doing garbage collection before. If so, garbage collection will free the old dirty records, so that you get available space. If not, and you actually have too much data stored (typically also from other FDS data, not related to the peer manger), it will not be enough.

    There are a few things to note here, some of which are related to a few things not handled very well in the peer manager. First of all, you cannot delete the bond for a peer that is currently being used. That will lead to undefined behavior in the peer manager. However, the peer manager does not prevent deleting the bond for a peer that is currently connected. Secondly, as you note, the peer manger PM_EVT_STORAGE_FULL will trigger deletion of the oldest peer even if it is the only peer.

    My current plan is to disable peer ranking in our SDK configuration so that the peer manager will never auto delete any peers. We have internal handling that will mark peers for deletion when they are no longer needed.
    Any suggestions or corrections based on the summary above?

    That makes sense. Either that, or improve the peer manager handling of this scenario. Altos, another approach sometimes used is to regularly check the amount of free space in FDS, and do a garbage collection if there is a need for that (there is memory to be freed, and amount of free space is limited). Potentially also delete old and no longer needed data before doing the GC. 

    Einar

Children
No Data
Related