Reasons why bonds are deleted?

We have a custom board in production, which is nRF52832, s132, nrf5 SDK. 

Below I provide you some code regarding the peer manager. We use a static key to pair and connect to the devices. Usually only one device will connect to the board. 

We recently have faced some issues regarding connection from our clients. The error 0x1006 from security fail event from peer manager is triggered, and the solution is always to unpair and pair again from the mobile phone. But, we cannot understand why this happens, what can trigger the bonds to be deleted? Or is there a way that we can track the reason from the event  PM_EVT_PEERS_DELETE_SUCCEEDED? 

The peer manager initialization:

/**@brief Function for the Peer Manager initialization.
 */
static void peer_manager_init(void)
{
    ble_gap_sec_params_t sec_param;
    ret_code_t           err_code;

    err_code = pm_init();
    hardfault = app_error_check_logger(err_code, true, log_str[PEER_MANAGER_INIT], 0, NULL);

    memset(&sec_param, 0, sizeof(ble_gap_sec_params_t));

    // Security parameters to be used for all security procedures. These are common parameters for bonding.
    sec_param.bond           = 1;
    sec_param.mitm           = 0;
    sec_param.lesc           = 0;
    sec_param.keypress       = 0;
    sec_param.io_caps        = BLE_GAP_IO_CAPS_DISPLAY_ONLY;
    sec_param.oob            = 0;
    sec_param.min_key_size   = 7;
    sec_param.max_key_size   = 16;
    sec_param.kdist_own.enc  = 1;
    sec_param.kdist_own.id   = 1;
    sec_param.kdist_peer.enc = 1;
    sec_param.kdist_peer.id  = 1;

    err_code = pm_sec_params_set(&sec_param);   // sets security parameters for pairing and bonding
    hardfault = app_error_check_logger(err_code, true, log_str[PEER_MANAGER_INIT], 1, NULL);

    err_code = pm_register(pm_evt_handler);   // register an event handler for the module
    hardfault = app_error_check_logger(err_code, true, log_str[PEER_MANAGER_INIT], 2, NULL);
}

The peer manager event handler :

static void pm_evt_handler(pm_evt_t const * p_evt)
{
    pm_handler_on_pm_evt(p_evt);    // Logging peer events. Starts encryption if connected to a bonded device.
    pm_handler_disconnect_on_sec_failure(p_evt);    // Disconnects if the connection was not secured.
    pm_handler_flash_clean(p_evt);

    switch (p_evt->evt_id)
    {
        case PM_EVT_CONN_SEC_SUCCEEDED:   //a link has been encrypted, result of a call of pm_conn_secure or of an action by the peer.
            m_peer_id = p_evt->peer_id;
            pm_local_database_has_changed();
            break;

        case PM_EVT_PEERS_DELETE_SUCCEEDED:   // a peer was cleared from flash storage (result of pm_peer_delete)
            NRF_LOG_INFO("PM_EVT_PEERS_DELETE_SUCCEEDED");
            hardfault = app_error_check_logger(1, false, "peersdel", 1, NULL);
            advertising_start(false);
            break;

        case PM_EVT_PEER_DATA_UPDATE_SUCCEEDED: // a piece of peer data was tored, updated or cleared in flash storage.
            if (     p_evt->params.peer_data_update_succeeded.flash_changed
                 && (p_evt->params.peer_data_update_succeeded.data_id == PM_PEER_DATA_ID_BONDING))
            {
                NRF_LOG_INFO("New Bond. Peer data update succeeded.");
            }
            break;
        case PM_EVT_CONN_SEC_FAILED:    // a pairing or encryption procedure has failed. in some cases, this means that security is not possible on this link.
        {
            NRF_LOG_INFO("Pairing or Encryption procedure failed. Peer id=%d, Error=%x, Procedure=%d, Source=%d", p_evt->peer_id, p_evt->params.conn_sec_failed.error,
                                                                                        p_evt->params.conn_sec_failed.procedure,p_evt->params.conn_sec_failed.error_src);
            uint8_t proc=0;
            if(p_evt->params.conn_sec_failed.error_src == 1)
            {
                  proc = p_evt->params.conn_sec_failed.procedure + 10;
            }
            else
            {
                  proc = p_evt->params.conn_sec_failed.procedure;
            }
            hardfault = app_error_check_logger(p_evt->params.conn_sec_failed.error, false, "secfail", proc, NULL);
            m_scoliosense.err_code = p_evt->params.conn_sec_failed.error;
            ble_error_sec_update(m_conn_handle, &m_scoliosense);
         }
            break;
         case PM_EVT_CONN_SEC_CONFIG_REQ:
         {
            // Allow or reject pairing request from an already bonded peer.
            NRF_LOG_INFO("Repairing Process was initiated.");
            pm_conn_sec_config_t conn_sec_config = {.allow_repairing = true};
            pm_conn_sec_config_reply(p_evt->conn_handle, &conn_sec_config);
         } 
         break;

        default:
            break;
    }
}

Any help is much appreciated! 

Best regards,
Dimitra

Parents
  • Hello Dimitra,

    The Peer Manager can be configured to delete the oldest bond to free up space in FDS for new bonds. This works fine as long as FDS is only used for storing bonding information. However, if FDS is also used for storing other application data, there is a possibility that the Peer Manager could be forced to delete the last and only bond. So, are you using FDS for storing other data in your application?

    Best regards,

    Vidar

  • Yes we use FDS to save application data in flash. 

    So, does this mean that there is a way to configure the peer manager to never delete the only one bond of the device or I should keep track of the free memory, and stop saving data early enough before the flash is trully full, and then peer manager will not be forced to delete the last and only bond? How can I do this, or where can I find how much space peer manager needs so that it never deletes the bonds? Or is there another way to handle this situation in general?


    best regards,
    Dimitra

  •  fds_stat_t::words_used includes the page tag.

    ok, but do words used include dirty words too?

    so if I understand I have 12 pages * (2048 words - 2 words page tag) = 24552 words available to use for my data and for peer manager. 
    Peer manager uses for one bond (and peer rank, service pending flag etc) 53 words , if I read words_used from fds_stat. 

    so let's say that I want to leave a lot of space available for peer manager, 1 whole page. 
    the available space becomes: 11 pages * 2046 words = 22506 words. 

    do i check then when words_used >= 22506 --> delete some records 
    or words_used + freeable_words >= 22506 --> delete some records 

    because I call regularly GC , and I can see that freeable words become 0 when gc succeeds, but I am not sure whether freeable words are included or not in words used. If they are not included then I must consider them in the calculation. 

    e.g. if the limit of "full" space is 100 words. and words used = 100 and freeable = 10, and words used do not include freeable words, then i have exceeded the limit, and I will have a problem even if i call gc. 
    on the other hand, if words_used =100 and freeable = 10, but they're included in words_used, then truly I have 90 words and have time to call gc, and then delete extra records when needed. 

     I also want to apologize for the long details, but I have been working on this for a long time and maybe I confuse myself without a reason.. 


    Best regards, 

    Dimitra 

  • Hi Dimitra,

    It's best to not perform GC too frequently as it leads to increased flash wear. Freeable words are not included in words used. Also, if possible, avoid running GC in connections as it can lead to scheduling conflicts. Running GC on this many pages is relatively time consuming.

    DimitraN said:
    if the limit of "full" space is 100 words. and words used = 100 and freeable = 10, and words used do not include freeable words, then i have exceeded the limit, and I will have a problem even if i call gc. 

    This is correct. So in this case you would have to delete some records to get more "freeable" words.

    Best regards,

    Vidar

  • It's best to not perform GC too frequently as it leads to increased flash wear.

    But what is too frequently? 

    We have a record that is updated in fds every 30 seconds all day. This creates 6 freeable words every 30 seconds. 12 freeable words every 1 minute * 60 * 24 = 17280 freeable words a day. 
    If we call GC for example 2 times a day to free up these words, is it frequently?

    Because the limit I will put is 22400 words, where I consider that flash is full, and I stop saving new records and start deleting old ones.  in order to always leave one page available for peer manager, so that it will not delete the bonds.. 

    So, is 2 times a day calling GC frequent? If yes, then the only thing I can do is call it ONLY when freeable_words + words_used >= limit of 22400. 

    Another question after testing, I delete records when I send them over BLE to mobile.. at this point, I expect that almost all words_used become freeable_words, but sometimes fds_stat returns almost equal amount to both variables, like freeable words are updated and words_used have not been updated yet. is this a case? have you observed this kind of a behaviour?

    Best regards,
    Dimitra

  • Hi Dimitra,

    I'd say too frequent is when you call GC more frequently than necessary. For instance, if you run GC when only 50 % of the memory is used. From the 'Absolute maximum ratings', you can see that the flash is rated for 10 000 write/erase cycles.

    DimitraN said:
    Another question after testing, I delete records when I send them over BLE to mobile.. at this point, I expect that almost all words_used become freeable_words, but sometimes fds_stat returns almost equal amount to both variables, like freeable words are updated and words_used have not been updated yet. is this a case? have you observed this kind of a behaviour?

    Not that I can remember. Which SDK version are you using? 

    Best regards,

    Vidar

  • After more testing it turns out that words_used include freeable_words. 
    This means that when I delete records, words_used become freeable, therefore freeable are increased and words_used stay the same since a freeable is also a used word. So, words_used will decrease when GC is called and freeable are actually erased. 

    Since this is how it works, I track only words_used to check if they exceed the limit, and when freeable are a lot I call GC. 
    Only when words_used exceed the limit and freeable are so few that GC will not help, I delete some records. 


    From the 'Absolute maximum ratings', you can see that the flash is rated for 10 000 write/erase cycles

    from this if I can calculate the writes/erases  of flash:

    I have 13 virtual pages of 2048 size 
    If I write 17280 words of A record a day
    and 1152 words of B record a day
    and let's say peer manager writes/updates its files twice a day 2*50 words = 100 words a day 
    and GC is called twice to delete in total around 17280 words (65% of memory) 

    then we have: 
    2046 words * 13 pages = 26598 words 
    26598 words * 10000 / (2 * 17280 + 1152 + 100) = 7427 days = ~20 years 
    is this how we could estimate the flash life?

Reply
  • After more testing it turns out that words_used include freeable_words. 
    This means that when I delete records, words_used become freeable, therefore freeable are increased and words_used stay the same since a freeable is also a used word. So, words_used will decrease when GC is called and freeable are actually erased. 

    Since this is how it works, I track only words_used to check if they exceed the limit, and when freeable are a lot I call GC. 
    Only when words_used exceed the limit and freeable are so few that GC will not help, I delete some records. 


    From the 'Absolute maximum ratings', you can see that the flash is rated for 10 000 write/erase cycles

    from this if I can calculate the writes/erases  of flash:

    I have 13 virtual pages of 2048 size 
    If I write 17280 words of A record a day
    and 1152 words of B record a day
    and let's say peer manager writes/updates its files twice a day 2*50 words = 100 words a day 
    and GC is called twice to delete in total around 17280 words (65% of memory) 

    then we have: 
    2046 words * 13 pages = 26598 words 
    26598 words * 10000 / (2 * 17280 + 1152 + 100) = 7427 days = ~20 years 
    is this how we could estimate the flash life?

Children
  • DimitraN said:
    After more testing it turns out that words_used include freeable_words. 

    Sorry about this. I reviewed the FDS implementation but overlooked this.

    DimitraN said:
    Since this is how it works, I track only words_used to check if they exceed the limit, and when freeable are a lot I call GC. 
    Only when words_used exceed the limit and freeable are so few that GC will not help, I delete some records. 

    Sounds good.

    DimitraN said:
    I have 13 virtual pages of 2048 size 
    If I write 17280 words of A record a day
    and 1152 words of B record a day
    and let's say peer manager writes/updates its files twice a day 2*50 words = 100 words a day 
    and GC is called twice to delete in total around 17280 words (65% of memory) 

    then we have: 
    2046 words * 13 pages = 26598 words 
    26598 words * 10000 / (2 * 17280 + 1152 + 100) = 7427 days = ~20 years 
    is this how we could estimate the flash life?

    Keep in mind that one page is reserved for swap and is not used for data storage. Also, the number of erase cycles per page is what matters most, and this is affected by how often garbage collection is performed. Other than that, the calculation seems to be correct.

  • Ok thank you very much for your help and advice!! 

    I think everything is much more clear now, and this development for handling flash, will help so that peer manager will not delete bonds anymore

    Kind regards,

    Dimitra 

  • Thanks for the update. After reviewing the answer you accepted I realize I forgot to include the screenshot showing file line to remove in the flash_placement.xml to make SES only display the size of the application not including the Softdevice.

Related