This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

GATT Local database (or pairing) corrupted in iOS

Hi all,

I'll explain the problem. Our device (that uses a nRF52832) works with a custom app developed for iOS (v 13.4.1) and we've a custom set of services/characteristic besides other SIG's standard services, like the Battery Service or the TX Power Service. Obviously, since we're working with iOS, we use full ANCS implementation.

Also, we use the CTS Client implementation, modified to support also the Local Time characteristic (that is optional in the CTS, but implemented by iOS) and the DFU Buttonless Service.

Moreover, we use the fds module to store some data (like the setup of the i2c sensors, a lightly log system and so on...); obviously we respect the fds indication about the File_ID/Record_Key, and we do not use the forbidden set reserved to the Peer Manager.

All characteristics, except one, works in JUST_WORK, so, the device requires a bonding with the Apple device. Moreover, we use a customized bootloader (but the core, not modified, is the Nordic's Buttonless bootloader) to perform the DFU OTA.

All works very well without any problem, and, from the DFU OTA to the data exchange between the device and the Apple phones, the system is very stable and reliable.

But, sometime, we've experienced very boring problems. In few words, from one connection to another, the phone reported that the bonding data are lost (or corrupted), so, we can only remove the bonding data and perform a new bonding procedure.

This is frustrating, and we are not able to understand why/when this happens.

We've check the files saved by the Peer Manager (3 files: Bonding Data, Rank Data, GATT Local Database Data) and all seem always correct (just to be clear, for the first 2 files we've also know the format, but for the last one, Nordic's library does not provide any information. We only suppose that an entry like (0x) 03-00-00-00-56-00-0D-00-02-00-02-00-11-00-02-00-02-00-15-00-02-00-00-00 ... -64-1E is composed by a uint32_t field (flags = 03-00-00-00), an uint16_t field (# of 4-B words, 56-00) and then the list of the CCCD's of the characteristics (f.e 0D-00-02-00-02-00, where 0D-00 is the handle, the first 02-00 is unknown to we and the last 02-00 is the flag that indicates if the notifications are active or not). At the end (64-1E), we suppose there are something like a CRC (?))

Unfortunately we're not able to catch when the data are corrupted, so, I would ask if some one has experienced a similar problem...

Moreover, to lost the bonding data is sufficient that only one of the three files will be corrupted? Or, as in our opinion, only the first file (Bonding Data) must be corrupted?

Our setup is: nRF52832, SDK 16.0, IAR 7.50. Apple devices with iOS 13.4.1

Parents
  • Hi,

    I don't believe I have seen reports of issues that are similar to what you're describing here (ie data corruption with fds + lost bonding). Do you still have a device is in this  "bad state"? It might be interesting to check if the errors reported by the phone matches the error reported through the Peer manager PM_EVT_CONN_SEC_FAILED event (assuming this events get triggered?). Another thing that may be worth trying is to enable the FDS_CRC_CHECK_ON_READ and  FDS_CRC_CHECK_ON_WRITE option. Maybe it can help rule out data corruption as the problem.

    Moreover, to lost the bonding data is sufficient that only one of the three files will be corrupted? Or, as in our opinion, only the first file (Bonding Data) must be corrupted?

     Only lost/corrupted bond info should prevent the link from getting secured.

  • Hi Vidar,

    in advance thanks for your reply. This morning I tried to "trigger" and replicate the problem. In was not easy, but finally, I got something...

    This is the scenario: 2 iPhone (iOS versions different) both paired with my device. I continuously connected from one to the other, disabling (on the phone side) the Bluetooth. At the same time, I logged through the UART, the activities on the device. This "switching task" has required a lot of time, but, as told, finally I got an error. This is the log:

    [Mon May 25 14:01:20.640 2020] <info> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
    [Mon May 25 14:01:20.640 2020] <info> app: FDS UPDATE
    [Mon May 25 14:01:20.640 2020] <info> app:  F6 0B 00 00            |....    
    [Mon May 25 14:01:20.640 2020] <info> app:  00 C0                  |..      
    [Mon May 25 14:01:20.640 2020] <info> app:  08 C0                  |..      
    [Mon May 25 14:01:20.640 2020] <info> app: SLB_LOG_INFO PM: F. ID 0x2000, R. Key 0x2008
    [Mon May 25 14:01:20.677 2020] <info> app: Write on char with handle = 45; Len = 4
    [Mon May 25 14:01:20.677 2020] <info> app:  43 46 42 32            |CFB2    
    [Mon May 25 14:01:20.677 2020] <info> app: Notification on Command char. (res: 0)
    [Mon May 25 14:01:20.736 2020] <info> app: Write on char with handle = 45; Len = 4
    [Mon May 25 14:01:20.736 2020] <info> app:  04 04 04 04            |....    
    [Mon May 25 14:01:20.736 2020] <info> app: Notification on Command char. (res: 0)
    [Mon May 25 14:01:21.311 2020] <info> app: Write on char with handle = 45; Len = 4
    [Mon May 25 14:01:21.334 2020] <info> app:  53 50 44 4C            |SPDL    
    [Mon May 25 14:01:21.334 2020] <info> app: Notification on Command char. (res: 0)
    [Mon May 25 14:01:22.591 2020] <info> app: Disconnect Complete.
    [Mon May 25 14:01:22.591 2020] <info> app: Disconnected. Reason: 13
    [Mon May 25 14:01:23.243 2020] <info> peer_manager_handler: Attempting to clean flash.
    [Mon May 25 14:01:23.243 2020] <info> app: Remote Database loaded from flash.
    [Mon May 25 14:01:23.243 2020] <info> app: Connected.
    [Mon May 25 14:01:23.694 2020] <info> peer_manager_handler: Connection secured: role: Peripheral, conn_handle: 0, procedure: Encryption
    [Mon May 25 14:01:23.694 2020] <info> app: BLE_GAP_EVT_CONN_SEC_UPDATE
    [Mon May 25 14:01:23.694 2020] <info> peer_manager_handler: Attempting to clean flash.
    [Mon May 25 14:01:23.694 2020] <info> peer_manager_handler: Deleting lowest ranked peer (peer_id: 1)
    [Mon May 25 14:01:23.694 2020] <info> app: FDS GARBAGE COLLECTOR
    [Mon May 25 14:01:23.949 2020] <info> peer_manager_handler: Attempting to clean flash.
    [Mon May 25 14:01:24.102 2020] <info> peer_manager_handler: Deleting lowest ranked peer (peer_id: 0)
    [Mon May 25 14:01:24.102 2020] <info> app: FDS GARBAGE COLLECTOR
    [Mon May 25 14:01:24.102 2020] <info> peer_manager_handler: Peer data updated in flash: peer_id: 1, data_id: Peer rank, action: Update
    [Mon May 25 14:01:24.102 2020] <info> app: FDS UPDATE
    [Mon May 25 14:01:24.102 2020] <info> app:  F7 0B 00 00            |....    
    [Mon May 25 14:01:24.102 2020] <info> app:  01 C0                  |..      
    [Mon May 25 14:01:24.102 2020] <info> app:  06 C0                  |..      
    [Mon May 25 14:01:24.102 2020] <info> app: SLB_LOG_INFO PM: F. ID 0xC001, R. Key 0xC006
    [Mon May 25 14:01:24.102 2020] <error> peer_manager_handler: Peer deleted successfully: peer_id: 1
    [Mon May 25 14:01:24.102 2020] <info> app: FDS DELETE FILE
    [Mon May 25 14:01:24.102 2020] <info> app:  00 00 00 00            |....    
    [Mon May 25 14:01:24.102 2020] <info> app:  01 C0                  |..      
    [Mon May 25 14:01:24.102 2020] <info> app:  00 00                  |..           
    [Mon May 25 14:01:24.105 2020] <error> peer_manager_handler: Peer deleted successfully: peer_id: 0
    [Mon May 25 14:01:24.105 2020] <info> app: FDS DELETE FILE
    [Mon May 25 14:01:24.105 2020] <info> app:  00 00 00 00            |....    
    [Mon May 25 14:01:24.105 2020] <info> app:  00 C0                  |..      
    [Mon May 25 14:01:24.105 2020] <info> app:  00 00                  |..      
    [Mon May 25 14:01:24.105 2020] <info> app: GATT ATT MTU on connection 0x0 changed to 185.
    [Mon May 25 14:01:24.343 2020] <info> app: Write on char with handle = 13; Len = 2
    [Mon May 25 14:01:24.343 2020] <info> app:  02 00                  |..      
    [Mon May 25 14:01:24.756 2020] <info> ble_ancs_c: Control Point characteristic found.
    [Mon May 25 14:01:24.756 2020] <info> ble_ancs_c: Notification Point characteristic found.
    [Mon May 25 14:01:24.756 2020] <info> ble_ancs_c: Data Source characteristic found.
    [Mon May 25 14:01:24.756 2020] <error> app: ERROR 7 [NRF_ERROR_INVALID_PARAM] at D:\Apps\GIT_Work\XXX\source\YYY_ble.c:1224
    [Mon May 25 14:01:24.756 2020] PC at: 0x000270A3
    [Mon May 25 14:01:24.756 2020] <error> app: End of error report

    As you can see, after the disconnection @ 14:01:22.591, the consecutive connection starts successfully (there are also the confirm, @ 14:01:23.694, and the BLE_GAP_EVT_CONN_SEC_UPDATE).

    But, since seem that a pending cleaning operation was - meanwhile - started (@14:01:23.243), this operation continue his execution and then a "Deleting lowest ranked peer" (--> PM_EVT_STORAGE_FULL @14:01:23.694) is present.

    After this, the PM continue deleting the bonding data of all saved peers (@14:01:24.102 and @14:01:24.105), and I don't know why, since I expect the PM build something like a copy or a "backup"...

    After, when the app starts the discovery, I got the error. Note that at row 1224, there are following code, that is part of Nordic's ANCS event handler:

    ...
    
    if (m_gatts_discovered && m_ancs_discovered)
    {
        ret = pm_peer_data_remote_db_store(peer_id, (ble_gatt_db_srv_t *)m_peer_srv_buf, sizeof(m_peer_srv_buf), NULL);
    
        if (ret == NRF_ERROR_STORAGE_FULL)
        {
            ret = fds_gc();
        }
        APP_ERROR_CHECK(ret);   <--- row 1224
    }
    
    
    ...

    Apparently the iPhone(s) (or better, iOS) have not "blame" and, unfortunately, I cannot get his logs (the app is developed out of our office..)

    My doubt is...who can trigger the PM_EVT_STORAGE_FULL event? Is an automatism of the PM? 

    Can this log help you to...help me? I hope!

    Regards,

    Francesco

Reply
  • Hi Vidar,

    in advance thanks for your reply. This morning I tried to "trigger" and replicate the problem. In was not easy, but finally, I got something...

    This is the scenario: 2 iPhone (iOS versions different) both paired with my device. I continuously connected from one to the other, disabling (on the phone side) the Bluetooth. At the same time, I logged through the UART, the activities on the device. This "switching task" has required a lot of time, but, as told, finally I got an error. This is the log:

    [Mon May 25 14:01:20.640 2020] <info> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
    [Mon May 25 14:01:20.640 2020] <info> app: FDS UPDATE
    [Mon May 25 14:01:20.640 2020] <info> app:  F6 0B 00 00            |....    
    [Mon May 25 14:01:20.640 2020] <info> app:  00 C0                  |..      
    [Mon May 25 14:01:20.640 2020] <info> app:  08 C0                  |..      
    [Mon May 25 14:01:20.640 2020] <info> app: SLB_LOG_INFO PM: F. ID 0x2000, R. Key 0x2008
    [Mon May 25 14:01:20.677 2020] <info> app: Write on char with handle = 45; Len = 4
    [Mon May 25 14:01:20.677 2020] <info> app:  43 46 42 32            |CFB2    
    [Mon May 25 14:01:20.677 2020] <info> app: Notification on Command char. (res: 0)
    [Mon May 25 14:01:20.736 2020] <info> app: Write on char with handle = 45; Len = 4
    [Mon May 25 14:01:20.736 2020] <info> app:  04 04 04 04            |....    
    [Mon May 25 14:01:20.736 2020] <info> app: Notification on Command char. (res: 0)
    [Mon May 25 14:01:21.311 2020] <info> app: Write on char with handle = 45; Len = 4
    [Mon May 25 14:01:21.334 2020] <info> app:  53 50 44 4C            |SPDL    
    [Mon May 25 14:01:21.334 2020] <info> app: Notification on Command char. (res: 0)
    [Mon May 25 14:01:22.591 2020] <info> app: Disconnect Complete.
    [Mon May 25 14:01:22.591 2020] <info> app: Disconnected. Reason: 13
    [Mon May 25 14:01:23.243 2020] <info> peer_manager_handler: Attempting to clean flash.
    [Mon May 25 14:01:23.243 2020] <info> app: Remote Database loaded from flash.
    [Mon May 25 14:01:23.243 2020] <info> app: Connected.
    [Mon May 25 14:01:23.694 2020] <info> peer_manager_handler: Connection secured: role: Peripheral, conn_handle: 0, procedure: Encryption
    [Mon May 25 14:01:23.694 2020] <info> app: BLE_GAP_EVT_CONN_SEC_UPDATE
    [Mon May 25 14:01:23.694 2020] <info> peer_manager_handler: Attempting to clean flash.
    [Mon May 25 14:01:23.694 2020] <info> peer_manager_handler: Deleting lowest ranked peer (peer_id: 1)
    [Mon May 25 14:01:23.694 2020] <info> app: FDS GARBAGE COLLECTOR
    [Mon May 25 14:01:23.949 2020] <info> peer_manager_handler: Attempting to clean flash.
    [Mon May 25 14:01:24.102 2020] <info> peer_manager_handler: Deleting lowest ranked peer (peer_id: 0)
    [Mon May 25 14:01:24.102 2020] <info> app: FDS GARBAGE COLLECTOR
    [Mon May 25 14:01:24.102 2020] <info> peer_manager_handler: Peer data updated in flash: peer_id: 1, data_id: Peer rank, action: Update
    [Mon May 25 14:01:24.102 2020] <info> app: FDS UPDATE
    [Mon May 25 14:01:24.102 2020] <info> app:  F7 0B 00 00            |....    
    [Mon May 25 14:01:24.102 2020] <info> app:  01 C0                  |..      
    [Mon May 25 14:01:24.102 2020] <info> app:  06 C0                  |..      
    [Mon May 25 14:01:24.102 2020] <info> app: SLB_LOG_INFO PM: F. ID 0xC001, R. Key 0xC006
    [Mon May 25 14:01:24.102 2020] <error> peer_manager_handler: Peer deleted successfully: peer_id: 1
    [Mon May 25 14:01:24.102 2020] <info> app: FDS DELETE FILE
    [Mon May 25 14:01:24.102 2020] <info> app:  00 00 00 00            |....    
    [Mon May 25 14:01:24.102 2020] <info> app:  01 C0                  |..      
    [Mon May 25 14:01:24.102 2020] <info> app:  00 00                  |..           
    [Mon May 25 14:01:24.105 2020] <error> peer_manager_handler: Peer deleted successfully: peer_id: 0
    [Mon May 25 14:01:24.105 2020] <info> app: FDS DELETE FILE
    [Mon May 25 14:01:24.105 2020] <info> app:  00 00 00 00            |....    
    [Mon May 25 14:01:24.105 2020] <info> app:  00 C0                  |..      
    [Mon May 25 14:01:24.105 2020] <info> app:  00 00                  |..      
    [Mon May 25 14:01:24.105 2020] <info> app: GATT ATT MTU on connection 0x0 changed to 185.
    [Mon May 25 14:01:24.343 2020] <info> app: Write on char with handle = 13; Len = 2
    [Mon May 25 14:01:24.343 2020] <info> app:  02 00                  |..      
    [Mon May 25 14:01:24.756 2020] <info> ble_ancs_c: Control Point characteristic found.
    [Mon May 25 14:01:24.756 2020] <info> ble_ancs_c: Notification Point characteristic found.
    [Mon May 25 14:01:24.756 2020] <info> ble_ancs_c: Data Source characteristic found.
    [Mon May 25 14:01:24.756 2020] <error> app: ERROR 7 [NRF_ERROR_INVALID_PARAM] at D:\Apps\GIT_Work\XXX\source\YYY_ble.c:1224
    [Mon May 25 14:01:24.756 2020] PC at: 0x000270A3
    [Mon May 25 14:01:24.756 2020] <error> app: End of error report

    As you can see, after the disconnection @ 14:01:22.591, the consecutive connection starts successfully (there are also the confirm, @ 14:01:23.694, and the BLE_GAP_EVT_CONN_SEC_UPDATE).

    But, since seem that a pending cleaning operation was - meanwhile - started (@14:01:23.243), this operation continue his execution and then a "Deleting lowest ranked peer" (--> PM_EVT_STORAGE_FULL @14:01:23.694) is present.

    After this, the PM continue deleting the bonding data of all saved peers (@14:01:24.102 and @14:01:24.105), and I don't know why, since I expect the PM build something like a copy or a "backup"...

    After, when the app starts the discovery, I got the error. Note that at row 1224, there are following code, that is part of Nordic's ANCS event handler:

    ...
    
    if (m_gatts_discovered && m_ancs_discovered)
    {
        ret = pm_peer_data_remote_db_store(peer_id, (ble_gatt_db_srv_t *)m_peer_srv_buf, sizeof(m_peer_srv_buf), NULL);
    
        if (ret == NRF_ERROR_STORAGE_FULL)
        {
            ret = fds_gc();
        }
        APP_ERROR_CHECK(ret);   <--- row 1224
    }
    
    
    ...

    Apparently the iPhone(s) (or better, iOS) have not "blame" and, unfortunately, I cannot get his logs (the app is developed out of our office..)

    My doubt is...who can trigger the PM_EVT_STORAGE_FULL event? Is an automatism of the PM? 

    Can this log help you to...help me? I hope!

    Regards,

    Francesco

Children
  • Hello Francesco,

    This is very helpful - the log shows the PM_EVT_STORAGE_FULL (ie shared FDS storage is full) event being raised and that both ios devices are actually deleted on purpose in an attempt to free space in FDS. But the peer manager should only delete bonds if FDS really is full (no dirty records that can be freed through GC). Deleting existing bonds is kind of the last resort.

    Do your other FDS users store data periodically (eg maybe you are logging i2c sensor data flash?) that could help explain why the FDS gets full with no dirty data records that can be freed?

    Regards,

    Vidar

    edit:

    mectronicsinc said:
    After, when the app starts the discovery, I got the error. Note that at row 1224, there are following code, that is part of Nordic's ANCS event handler:

     The invalid param error is probably caused by the fact that the peer device is deleted at this point so the peer id is no longer valid.

  • Hi Vidar

    thanks for the reply. Yes, we use FDS to save the application setup, and for other information (and, as told, a minimal log storage). 

    Probably the real cause of FDS "continuous" work is due to the way which iOS interact with the device, since EACH time a connection/disconnection is established, it require an update of whole GATT local database.

    I found another ticket (https://devzone.nordicsemi.com/f/nordic-q-a/49589/loss-of-bonding-key) that is very similar to our issue, and we trying to implements the suggested solutions:

    • set PM_PEER_RANKS_ENABLED to 0
    • runs the GC (if need) after the disconnection...

    We're crossing the fingers!

    I let you know!

    Thanks again!

  • But the peer manager should only delete bonds if FDS really is full (no dirty records that can be freed through GC). Deleting existing bonds is kind of the last resort.

    However I really do not understand why you delete the bonds (since are really important data!) ...it was not better make, before the deletion, a temporary copy in ram of the bonds and restore it after the (successful?) run of the gc?

  • Hi,

    The peer manager should not delete the bonding information unless it's forced because there are no freeable data records left to delete. This is something that should only happen if you keep adding new data to flash without deleting/updating existing data.

    I did not manage to replicate this with the original ANCS example in SDK 16.0.0. I set the number of FDS_VIRTUAL_PAGES to 2 to reduce available space then made repeated re-connects to fill FDS data page. But as expected, only the dirty records were deleted. The bonding information was kept. 

    I think it may be easier to find out what's happening in your case if you try to monitor the flash usage with fds_stat E.g. confirm if there are indeed no dirty records when you receive the PM_EVT_STORAGE_FULL event.

Related