This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Peer Manager fails to save bond

I am working on a BLE peripheral (nRF52832 & SDK 15) which uses the Peer Manager to handle encryption and bonding to a previously existing central device.  Sometimes bonding works, but often I am seeing a failure mode where the encryption and bonding process succeeds (the connection completes and data flows from central to peripheral and vice versa) but the Peer Manager doesn't save the bond.  I discover this when I disconnect and reconnect, and I get PM_EVT_CONN_SEC_FAILED with PM_CONN_SEC_ERROR_PIN_OR_KEY_MISSING.  

So far I have attempted to verify that the bond data isn't being stored by looking at what happens after I get PM_EVT_CONN_SEC_SUCCEEDED during bonding.  I went into peer_database.c and added some NRF_LOG()s in write_buf_store().  I see the write happen and no error is reported.  However I never get either PM_EVT_PEER_DATA_UPDATE_SUCCEEDED or PM_EVT_PEER_DATA_UPDATE_FAILED.  I put log msgs into pdb_pds_evt_handler() in peer_database.c, and in my own pm_evt_handler.  It appears that the bond save is failing silently.  This is very frustrating.

Sometimes the bond *is* successfully saved, and I'm able to reconnect to the central indefinitely.  I can't yet figure out why it sometimes succeeds.

I discovered this issue when starting to write code to handle a possible failure which could be caused by the central losing its bond data... the idea being that if the connection fails, both sides will the delete their bonds automatically and can then be manually rebonded by the user.  I test this scenario by manually deleting the bond in the central... on the next connection attempt, reestablishing the link fails and the peripheral deletes its bonds.  This all seems to be working as expected.  But when I manually rebond, I'm stuck... the nRF52 usually isn't saving the new bond.

This current project is to replace a remote control based on another vendor's BLE chip.  Encryption and bonding have worked for years with the central so I have no reason to distrust it.

Any advice on how to debug this further would be much appreciated.

Darren

Parents
  • Hi, 

     

    We would need to know what you set in your conn_sec_config when calling pm_conn_sec_config_reply(). 

    If you have a look at our example, ble_app_proximity for example you can see we set allow_repairing = false inside pm_evt_handler(): 

    case PM_EVT_CONN_SEC_CONFIG_REQ:
    {
    // Reject pairing request from an already bonded peer.
    pm_conn_sec_config_t conn_sec_config = {.allow_repairing = false};
    pm_conn_sec_config_reply(p_evt->conn_handle, &conn_sec_config);
    } break;

     

    This mean, if the central lose bond information, the device would reject re-bonding. The reason for this is to avoid attacker force a device to delete original bond information and force a new bond where they can be sniff the bonding process or to be MITM. 

    If you set allow_repairing = true, the device supposed to replace old bond information with new one. I would need you to reproduce the issue with our example so we can try here. 

     

  • I have spent many hours tracing the execution of fds and can now see what is failing, but not why.  When I write my config data, the write request is passed down into nrf_fstorage_sd.c.  The queue_process() function tries to write the header (FDS_OP_WRITE_HEADER_BEGIN) and calls write_execute().  The write_execute() function calls sd_flash_write() which returns NRF_SUCCESS, but the soft device doesn't generate an event, so the fstorage state machine cannot continue to process the write. There is a comment in nrf_fstorage_sd.c in the switch() case that handles the return value from write_execute():

                /* The operation was accepted by the SoftDevice.
                 * If the SoftDevice is enabled, wait for a system event. Otherwise,
                 * the SoftDevice call is synchronous and will not send an event so we simulate it. */
    

    Because the soft device is enabled, the code is waiting for a system event, and never gets one.

    I did the same trace exercise when pairing, and the store of the bond fails for the same reason, no event is generated.

    Do you have any advice about how to figure out why the soft device is not posting file system events but seems to otherwise work?

  • Hi Darren, 

    It's a very strange issue that you have. Could you try to use the same central to test with the unmodified ble_app_proximity ? I would suggest to try with a fresh copy of the SDK. Maybe you need to modify your central a little bit to work with ble_app_proximity. 

    What else you can try is to use a phone (use nRFConnect app) to connect to your device and bond to the device to see if bond info is stored or not. 

    Which connection parameter your central use ? We have this case where flash write api failed when the connection timeout was too short, maybe it's related ? 

    However, in your case , the API failed silently without anytime out, this is very strange. If you can provide a simple central and peripheral code that can reproduce the issue, it would be great help. 

    Regarding the qwr module, it's queued write assistance module we add to all the examples. It's not needed for all application, only needed if queued write is used. You can remove it. 

     

Reply
  • Hi Darren, 

    It's a very strange issue that you have. Could you try to use the same central to test with the unmodified ble_app_proximity ? I would suggest to try with a fresh copy of the SDK. Maybe you need to modify your central a little bit to work with ble_app_proximity. 

    What else you can try is to use a phone (use nRFConnect app) to connect to your device and bond to the device to see if bond info is stored or not. 

    Which connection parameter your central use ? We have this case where flash write api failed when the connection timeout was too short, maybe it's related ? 

    However, in your case , the API failed silently without anytime out, this is very strange. If you can provide a simple central and peripheral code that can reproduce the issue, it would be great help. 

    Regarding the qwr module, it's queued write assistance module we add to all the examples. It's not needed for all application, only needed if queued write is used. You can remove it. 

     

Children
  • Hello,

    Here are the settings I'm using now

    #define MIN_CONN_INTERVAL               MSEC_TO_UNITS(7.5, UNIT_1_25_MS)          /**< Minimum acceptable connection interval. */
    #define MAX_CONN_INTERVAL               MSEC_TO_UNITS(30, UNIT_1_25_MS)         /**< Maximum acceptable connection interval. */
    #define SLAVE_LATENCY                   0                                       /**< Slave latency. */
    #define CONN_SUP_TIMEOUT                MSEC_TO_UNITS(4000, UNIT_10_MS)         /**< Connection supervisory time-out (4 seconds). */
    

    I don't think this is the root of the problem though.  What I discovered yesterday is that my app can use fds to store data, if I *don't* enable the soft device first... the filesystem events flow as expected and the write succeeds.  However, when the soft device is enabled, my app cannot write data with fds, even if there is no active connection to a central.  This is very weird indeed.

    It would be quite a lot of work to modify our central, so I'm hoping I can find the answer another way.

    Searching online, I've seen many references to this problem from an earlier version of the SDK.  For example: this and this and this

    But with the current SDK doesn't appear to work this way since there isn't a softdevice_sys_evt_handler_set().

  • Hi Darren,

    The connection parameter looks fine. 

    Could you send a simplified version of your peripheral that reproduce the issue. Or you can modify the proximity application to add  some fds command to store some data. Similar to what we have here

     

  • OK I finally figured out the problem, and realize now that I should have started with this approach.  I went back in the git history to the very first version of my app that claimed to be able to bond with the central and found that it still worked.  So I tried different commits binary search style until I found the one that broke fds.  It turns out to be makefile related.  About a month ago, to try to reduce the makefile size and complexity (and the code size), I went through and removed everything that didn't break the build.  It turns out that the application needs

      $(SDK_ROOT)/components/softdevice/common/nrf_sdh_soc.c \

    even though the app will compile and run without it and generates no errors or warnings.

    When I put that file back in the build, all of my fds problems went away.

    If this code is required for proper functioning, it seems like there should be some warning generated if you remove it.

  • Apparently there is more to this, because when I init fds in my own code, it prevents fds from working.  If I don't init, it works fine.  

    /**@brief   Wait for fds to initialize. */
    static void wait_for_fds_ready(void)
    {
        while (!m_fds_initialized)
        {
            sd_app_evt_wait();
        }
    }
    
        err_code = fds_init();/    APP_ERROR_CHECK(err_code);
    
        /* Wait for fds to initialize. */
        wait_for_fds_ready();
    

    I took this directly from fds SDK example.

    Seems like still there is something wrong if fds can't properly handle double-init and fails silently.

  • One can use the same construct to wait for the peer manager to init fds, and then init locally.  This seems... awkward.  Is the best practice to assume fds has been initialized by the peer manager and just not do it if the app is going to make fds calls itself? 

        peer_manager_init();
    
        wait_for_fds_ready();
    
        NRF_LOG_INFO("App Initializing fds...");
        m_fds_initialized = false;
    
        err_code = fds_init();
        APP_ERROR_CHECK(err_code);
    
        wait_for_fds_ready();
    

Related