Reasons why bonds are deleted?

We have a custom board in production, which is nRF52832, s132, nrf5 SDK. 

Below I provide you some code regarding the peer manager. We use a static key to pair and connect to the devices. Usually only one device will connect to the board. 

We recently have faced some issues regarding connection from our clients. The error 0x1006 from security fail event from peer manager is triggered, and the solution is always to unpair and pair again from the mobile phone. But, we cannot understand why this happens, what can trigger the bonds to be deleted? Or is there a way that we can track the reason from the event  PM_EVT_PEERS_DELETE_SUCCEEDED? 

The peer manager initialization:

/**@brief Function for the Peer Manager initialization.
 */
static void peer_manager_init(void)
{
    ble_gap_sec_params_t sec_param;
    ret_code_t           err_code;

    err_code = pm_init();
    hardfault = app_error_check_logger(err_code, true, log_str[PEER_MANAGER_INIT], 0, NULL);

    memset(&sec_param, 0, sizeof(ble_gap_sec_params_t));

    // Security parameters to be used for all security procedures. These are common parameters for bonding.
    sec_param.bond           = 1;
    sec_param.mitm           = 0;
    sec_param.lesc           = 0;
    sec_param.keypress       = 0;
    sec_param.io_caps        = BLE_GAP_IO_CAPS_DISPLAY_ONLY;
    sec_param.oob            = 0;
    sec_param.min_key_size   = 7;
    sec_param.max_key_size   = 16;
    sec_param.kdist_own.enc  = 1;
    sec_param.kdist_own.id   = 1;
    sec_param.kdist_peer.enc = 1;
    sec_param.kdist_peer.id  = 1;

    err_code = pm_sec_params_set(&sec_param);   // sets security parameters for pairing and bonding
    hardfault = app_error_check_logger(err_code, true, log_str[PEER_MANAGER_INIT], 1, NULL);

    err_code = pm_register(pm_evt_handler);   // register an event handler for the module
    hardfault = app_error_check_logger(err_code, true, log_str[PEER_MANAGER_INIT], 2, NULL);
}

The peer manager event handler :

static void pm_evt_handler(pm_evt_t const * p_evt)
{
    pm_handler_on_pm_evt(p_evt);    // Logging peer events. Starts encryption if connected to a bonded device.
    pm_handler_disconnect_on_sec_failure(p_evt);    // Disconnects if the connection was not secured.
    pm_handler_flash_clean(p_evt);

    switch (p_evt->evt_id)
    {
        case PM_EVT_CONN_SEC_SUCCEEDED:   //a link has been encrypted, result of a call of pm_conn_secure or of an action by the peer.
            m_peer_id = p_evt->peer_id;
            pm_local_database_has_changed();
            break;

        case PM_EVT_PEERS_DELETE_SUCCEEDED:   // a peer was cleared from flash storage (result of pm_peer_delete)
            NRF_LOG_INFO("PM_EVT_PEERS_DELETE_SUCCEEDED");
            hardfault = app_error_check_logger(1, false, "peersdel", 1, NULL);
            advertising_start(false);
            break;

        case PM_EVT_PEER_DATA_UPDATE_SUCCEEDED: // a piece of peer data was tored, updated or cleared in flash storage.
            if (     p_evt->params.peer_data_update_succeeded.flash_changed
                 && (p_evt->params.peer_data_update_succeeded.data_id == PM_PEER_DATA_ID_BONDING))
            {
                NRF_LOG_INFO("New Bond. Peer data update succeeded.");
            }
            break;
        case PM_EVT_CONN_SEC_FAILED:    // a pairing or encryption procedure has failed. in some cases, this means that security is not possible on this link.
        {
            NRF_LOG_INFO("Pairing or Encryption procedure failed. Peer id=%d, Error=%x, Procedure=%d, Source=%d", p_evt->peer_id, p_evt->params.conn_sec_failed.error,
                                                                                        p_evt->params.conn_sec_failed.procedure,p_evt->params.conn_sec_failed.error_src);
            uint8_t proc=0;
            if(p_evt->params.conn_sec_failed.error_src == 1)
            {
                  proc = p_evt->params.conn_sec_failed.procedure + 10;
            }
            else
            {
                  proc = p_evt->params.conn_sec_failed.procedure;
            }
            hardfault = app_error_check_logger(p_evt->params.conn_sec_failed.error, false, "secfail", proc, NULL);
            m_scoliosense.err_code = p_evt->params.conn_sec_failed.error;
            ble_error_sec_update(m_conn_handle, &m_scoliosense);
         }
            break;
         case PM_EVT_CONN_SEC_CONFIG_REQ:
         {
            // Allow or reject pairing request from an already bonded peer.
            NRF_LOG_INFO("Repairing Process was initiated.");
            pm_conn_sec_config_t conn_sec_config = {.allow_repairing = true};
            pm_conn_sec_config_reply(p_evt->conn_handle, &conn_sec_config);
         } 
         break;

        default:
            break;
    }
}

Any help is much appreciated! 

Best regards,
Dimitra

Parents
  • Hello Dimitra,

    The Peer Manager can be configured to delete the oldest bond to free up space in FDS for new bonds. This works fine as long as FDS is only used for storing bonding information. However, if FDS is also used for storing other application data, there is a possibility that the Peer Manager could be forced to delete the last and only bond. So, are you using FDS for storing other data in your application?

    Best regards,

    Vidar

  • Yes we use FDS to save application data in flash. 

    So, does this mean that there is a way to configure the peer manager to never delete the only one bond of the device or I should keep track of the free memory, and stop saving data early enough before the flash is trully full, and then peer manager will not be forced to delete the last and only bond? How can I do this, or where can I find how much space peer manager needs so that it never deletes the bonds? Or is there another way to handle this situation in general?


    best regards,
    Dimitra

  • Hi Dimitra,

    I think the easiest solution may be to periodically monitor the flash usage with the fds_stat() function and perform GC or freeing of records before the FDS becomes full.

    Best regards,

    Vidar

  • Ok, so there is not any other reason why the bonds might be deleted? \the only way is that the peer manager deletes them and it does it when there is no enough space. 

  • It depends on your implementation. Apart from calling pm_handler_flash_clean() in your pm event handler, do you have any other mechanisms for deleting bonds in your firmware? In the SDK examples, bonds are typically deleted if board button 1 is pressed at startup.

    /**@brief Auxiliary standard function for maintaining room in flash based on Peer Manager events.
     *
     * This function does the following:
     *  - Ranks peers by when they last connected.
     *  - Garbage collects the flash when needed.
     *  - Deletes the lowest ranked peer(s) when garbage collection is insufficient.
     *
     * @note See also @ref pm_handler_flash_clean_on_return.
     * @note In normal circumstances, this function should be called for every Peer Manager event.
     * @note This function is a supplement to @ref pm_handler_on_pm_evt, not its replacement.
     *
     * @param[in]  p_pm_evt  Peer Manager event to handle.
     */
    void pm_handler_flash_clean(pm_evt_t const * p_pm_evt);

  • hello, 

    We delete bonds when the button is pressed on the board, as a factory reset. But the customers claim that bonds are deleted without having pressed the button. The mechanism is that after reset, we read the NRF_POWER->RESETREAS register and if it is 0x01 then we know that a pin-reset occured, and therefore we delete bonds . 

    About pm_handler_flash_clean, do you think it is possible or safe, to change it and make it not delete the last peer ("Deletes the lowest ranked peer(s) when garbage collection is insufficient.") but rather I delete records to free up space... 

    Where can I find how much space the Peer Manager needs? 
    Here https://docs.nordicsemi.com/bundle/sdk_nrf5_v17.0.2/page/lib_fds_functionality.html, it says that peer manager uses 0xC000 file IDs and Record keys until 0xFFFF. But how much space does it really need? 

    Finally, in my application Flash size (project section placement)  is 0x5a000 ( = 368640) . 
    When I build it in Segger Embedded Studio, it says that 233,7 KB are used.
    On the other hand, nRF52832 has 512 kB, we use Softdevice (152 kB) and bootloader (24 kB), and then the space left for the application is 336 kB.


    In sdk_config.h, I have set 13 virtual pages (which peer manager also uses), and virtual page size is 2048 words. 1 page it says that is used by GC, so, I have available 2048 * 4 * 12 = 98304 bytes ? 
    I am a bit confused about the real space that is left for my data and the peer manager, in order to monitor the flash usage and free up space when needed.. Could you help me with this too? 

    Thank you very much for your help

    Dimitra

  • Hello Dimitra,

    DimitraN said:
    About pm_handler_flash_clean, do you think it is possible or safe, to change it and make it not delete the last peer ("Deletes the lowest ranked peer(s) when garbage collection is insufficient.") but rather I delete records to free up space... 

    It is possible, but it may require some rework of the PM implementation. However, if running GC is not sufficient, the bonding will fail. It seems better to avoid this issue altogether by running GC and deleting specific records before reaching this situation.

    DimitraN said:
    Where can I find how much space the Peer Manager needs? 
    Here https://docs.nordicsemi.com/bundle/sdk_nrf5_v17.0.2/page/lib_fds_functionality.html, it says that peer manager uses 0xC000 file IDs and Record keys until 0xFFFF. But how much space does it really need? 

    The size of each bond depends on how many CCCD's that need to be stored. You can use the fds_stat function before and after pairing to determine the size requirements in your applicaiton.

    DimitraN said:
    Finally, in my application Flash size (project section placement)  is 0x5a000 ( = 368640) . 
    When I build it in Segger Embedded Studio, it says that 233,7 KB are used.
    On the other hand, nRF52832 has 512 kB, we use Softdevice (152 kB) and bootloader (24 kB), and then the space left for the application is 336 kB.

    233.7 kB is including the Softdevice if you are looking at the build output. You can remove this highlighted line in your flash_placement.xml if you want it to only show the size of the app.

    DimitraN said:
    In sdk_config.h, I have set 13 virtual pages (which peer manager also uses), and virtual page size is 2048 words. 1 page it says that is used by GC, so, I have available 2048 * 4 * 12 = 98304 bytes ? 
    I am a bit confused about the real space that is left for my data and the peer manager, in order to monitor the flash usage and free up space when needed.. Could you help me with this too? 

    One page is reserved for GC, which leaves you with 12 pages * 2048 words for storing data records. Each page uses two words for the page tag.  fds_stat_t::words_used includes the page tag.

    Best regards,

    Vidar

Reply
  • Hello Dimitra,

    DimitraN said:
    About pm_handler_flash_clean, do you think it is possible or safe, to change it and make it not delete the last peer ("Deletes the lowest ranked peer(s) when garbage collection is insufficient.") but rather I delete records to free up space... 

    It is possible, but it may require some rework of the PM implementation. However, if running GC is not sufficient, the bonding will fail. It seems better to avoid this issue altogether by running GC and deleting specific records before reaching this situation.

    DimitraN said:
    Where can I find how much space the Peer Manager needs? 
    Here https://docs.nordicsemi.com/bundle/sdk_nrf5_v17.0.2/page/lib_fds_functionality.html, it says that peer manager uses 0xC000 file IDs and Record keys until 0xFFFF. But how much space does it really need? 

    The size of each bond depends on how many CCCD's that need to be stored. You can use the fds_stat function before and after pairing to determine the size requirements in your applicaiton.

    DimitraN said:
    Finally, in my application Flash size (project section placement)  is 0x5a000 ( = 368640) . 
    When I build it in Segger Embedded Studio, it says that 233,7 KB are used.
    On the other hand, nRF52832 has 512 kB, we use Softdevice (152 kB) and bootloader (24 kB), and then the space left for the application is 336 kB.

    233.7 kB is including the Softdevice if you are looking at the build output. You can remove this highlighted line in your flash_placement.xml if you want it to only show the size of the app.

    DimitraN said:
    In sdk_config.h, I have set 13 virtual pages (which peer manager also uses), and virtual page size is 2048 words. 1 page it says that is used by GC, so, I have available 2048 * 4 * 12 = 98304 bytes ? 
    I am a bit confused about the real space that is left for my data and the peer manager, in order to monitor the flash usage and free up space when needed.. Could you help me with this too? 

    One page is reserved for GC, which leaves you with 12 pages * 2048 words for storing data records. Each page uses two words for the page tag.  fds_stat_t::words_used includes the page tag.

    Best regards,

    Vidar

Children
  •  fds_stat_t::words_used includes the page tag.

    ok, but do words used include dirty words too?

    so if I understand I have 12 pages * (2048 words - 2 words page tag) = 24552 words available to use for my data and for peer manager. 
    Peer manager uses for one bond (and peer rank, service pending flag etc) 53 words , if I read words_used from fds_stat. 

    so let's say that I want to leave a lot of space available for peer manager, 1 whole page. 
    the available space becomes: 11 pages * 2046 words = 22506 words. 

    do i check then when words_used >= 22506 --> delete some records 
    or words_used + freeable_words >= 22506 --> delete some records 

    because I call regularly GC , and I can see that freeable words become 0 when gc succeeds, but I am not sure whether freeable words are included or not in words used. If they are not included then I must consider them in the calculation. 

    e.g. if the limit of "full" space is 100 words. and words used = 100 and freeable = 10, and words used do not include freeable words, then i have exceeded the limit, and I will have a problem even if i call gc. 
    on the other hand, if words_used =100 and freeable = 10, but they're included in words_used, then truly I have 90 words and have time to call gc, and then delete extra records when needed. 

     I also want to apologize for the long details, but I have been working on this for a long time and maybe I confuse myself without a reason.. 


    Best regards, 

    Dimitra 

  • Hi Dimitra,

    It's best to not perform GC too frequently as it leads to increased flash wear. Freeable words are not included in words used. Also, if possible, avoid running GC in connections as it can lead to scheduling conflicts. Running GC on this many pages is relatively time consuming.

    DimitraN said:
    if the limit of "full" space is 100 words. and words used = 100 and freeable = 10, and words used do not include freeable words, then i have exceeded the limit, and I will have a problem even if i call gc. 

    This is correct. So in this case you would have to delete some records to get more "freeable" words.

    Best regards,

    Vidar

  • It's best to not perform GC too frequently as it leads to increased flash wear.

    But what is too frequently? 

    We have a record that is updated in fds every 30 seconds all day. This creates 6 freeable words every 30 seconds. 12 freeable words every 1 minute * 60 * 24 = 17280 freeable words a day. 
    If we call GC for example 2 times a day to free up these words, is it frequently?

    Because the limit I will put is 22400 words, where I consider that flash is full, and I stop saving new records and start deleting old ones.  in order to always leave one page available for peer manager, so that it will not delete the bonds.. 

    So, is 2 times a day calling GC frequent? If yes, then the only thing I can do is call it ONLY when freeable_words + words_used >= limit of 22400. 

    Another question after testing, I delete records when I send them over BLE to mobile.. at this point, I expect that almost all words_used become freeable_words, but sometimes fds_stat returns almost equal amount to both variables, like freeable words are updated and words_used have not been updated yet. is this a case? have you observed this kind of a behaviour?

    Best regards,
    Dimitra

  • Hi Dimitra,

    I'd say too frequent is when you call GC more frequently than necessary. For instance, if you run GC when only 50 % of the memory is used. From the 'Absolute maximum ratings', you can see that the flash is rated for 10 000 write/erase cycles.

    DimitraN said:
    Another question after testing, I delete records when I send them over BLE to mobile.. at this point, I expect that almost all words_used become freeable_words, but sometimes fds_stat returns almost equal amount to both variables, like freeable words are updated and words_used have not been updated yet. is this a case? have you observed this kind of a behaviour?

    Not that I can remember. Which SDK version are you using? 

    Best regards,

    Vidar

  • After more testing it turns out that words_used include freeable_words. 
    This means that when I delete records, words_used become freeable, therefore freeable are increased and words_used stay the same since a freeable is also a used word. So, words_used will decrease when GC is called and freeable are actually erased. 

    Since this is how it works, I track only words_used to check if they exceed the limit, and when freeable are a lot I call GC. 
    Only when words_used exceed the limit and freeable are so few that GC will not help, I delete some records. 


    From the 'Absolute maximum ratings', you can see that the flash is rated for 10 000 write/erase cycles

    from this if I can calculate the writes/erases  of flash:

    I have 13 virtual pages of 2048 size 
    If I write 17280 words of A record a day
    and 1152 words of B record a day
    and let's say peer manager writes/updates its files twice a day 2*50 words = 100 words a day 
    and GC is called twice to delete in total around 17280 words (65% of memory) 

    then we have: 
    2046 words * 13 pages = 26598 words 
    26598 words * 10000 / (2 * 17280 + 1152 + 100) = 7427 days = ~20 years 
    is this how we could estimate the flash life?

Related