Possible race conditions in pm_peers_delete?

Question

Hi 
 I'm using the PeerManager/FDS/FStorage in my project since a few months. We have this software running on real hardware and on simulated environments. While researching the different behavior in the simulated environment i found some possible race conditions related to asynchronous calls to the FDS. 
 Environment:
The system has two or more bonded peers 
 Here is what i looked at: 
 
 We call pm_peers_delete at
startup to erase all those bonded
peers 
 As a subfunction later
peer_data_clear gets called
synchronously 
 This function calls
the asynchronous erase operation by
calling fds_file_delete 
 Back to
step 3 for any peer to delete 
 
 Possible race condition 1: 
 When fds_file_delete is called the m_pds.clearing is set to true on success. If the asynchronous erase ends before the "fds_file_delete" call returns, the operation completed callback gets called before m_pds.clearing is set to true. As a consequence the m_pds.clearing is never reset back to false and erasing bonds is blocked forever. 
 Possible race condition 2: 
 The function peer_data_clear is called in a loop multiple times for any peer to delete. If the asynchronous delete takes longer than the time between these two synchronous calls? The second call of peer_data_clear will not work properly because the m_pds_clearing is still set to true when trying to erase the second bonding. 
 The race condition one is what happend in my simulated environment and i could easily modify the code to work also in these conditions. The second race condition i've never seen happen, but i'm not sure it never will. 
 Any thouhts about these race conditions? 
 Regards Adrian

Adrian Eggenberger · Accepted Answer

For the race condition 1 the code in peer_data_clear can be changed as follows: 
 // Function for clearing all peer data of one peer.
// These operations will be sent to FDS one at a time.
static void peer_data_clear()
{
 ret_code_t retval;
 uint16_t file_id;
 fds_record_desc_t desc;
 fds_find_token_t token = {0};
 pm_peer_id_t peer_id = peer_id_get_next_deleted(PM_PEER_ID_INVALID);

 while ( (peer_id != PM_PEER_ID_INVALID)
 && (fds_record_find_in_file(peer_id_to_file_id(peer_id), &desc, &token)
 == FDS_ERR_NOT_FOUND))
 {
 peer_id_free(peer_id);
 peer_id = peer_id_get_next_deleted(peer_id);
 }

 if (!m_pds.clearing && (peer_id != PM_PEER_ID_INVALID))
 {
 file_id = peer_id_to_file_id(peer_id);

 /*
 The patch at this place ensures that the clearing variable works correctly
 if the asynchronous file delete ends before the return from fds_file_delete.
 URL: devzone.nordicsemi.com/.../ 
 */
 m_pds.clearing = true;

 retval = fds_file_delete(file_id);

 if (retval != FDS_SUCCESS)
 {
 m_pds.clearing = false;
 }

 if (retval == FDS_SUCCESS)
 {
 // do nothing
 }
 else if (retval == FDS_ERR_NO_SPACE_IN_QUEUES)
 {
 m_pds.clear_queued = true;
 }
 else
 {
 pds_evt_t pds_evt;

 pds_evt.evt_id = PDS_EVT_ERROR_UNEXPECTED;
 pds_evt.peer_id = peer_id;
 pds_evt.data_id = PM_PEER_DATA_ID_INVALID;
 pds_evt.store_token = PM_STORE_TOKEN_INVALID;
 pds_evt.result = retval;

 pds_evt_send(&pds_evt);
 }
 }
}

Possible race conditions in pm_peers_delete?

Top Replies