This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Bug in peer manager in SDK 14.2

I have recently ported my nRF52832 application from SDK 12.3 to SDK 14.2 and have been disappointed to find that the preripheral is locking up after a period in operation.

In order to track down the issue I left the debugger running until the application froze (a few hours) and found that peer manager seems to be caught in a infinite loop trying to clear some flash. Exactly what it is trying to store is unclear - I only have one central bonded.

I have since built a short central application that repeatedly connects and disconnects from the peripheral at 2 second intervals which has made this bug repeatable - here is an example stack trace when I hit pause in the debugger after the application has frozen:

image description

I have tried to work through the peer manager code and it appears that pds_peer_data_store attempts to use FDS to write to flash but gets FDS_ERR_NO_SPACE_IN_FLASH. FDS then attempts to do come garbage collection after which the peer manager attempts again to write to flash but again gets the no space error. This continues indefinitely.

Is this a known issue and if so is there any workaround recommended?

Update: I have tracked the memory leak to the function gscm_local_db_cache_update:

image description

Prior to this function being executed I get the following debug output to the console:

app: Connected to a previously bonded device.
app: Connected
app: Connection secured. Role: 1. conn_handle: 0x0, Procedure: 0
ble_gatt: Peer on connection 0x0 requested an ATT MTU of 185 bytes.
ble_gatt: Updating ATT MTU to 23 bytes (desired: 23) on connection 0x0.

I'm not sure what is causing this MTU negotiation but it appears that the SDK is trying to cache the connection parameter. Rather than updating the existing record for the peer it is creating a new one, so after a few connect/disconnect cycles the flash fills up and it gets stuck in the garbage collecting loop.

So I guess my question is:

  • What might be causing this MTU negotiation from the peer in the first place?

  • What needs to be done to fix the memory leak?

FURTHER UPDATE: I have added some additional logging to give visibility of the event flow:

nrf_sdh_ble: BLE event: 0x10.  // BLE_GAP_EVT_CONNECTED
app: Connected to a previously bonded device.
app: Connected
nrf_sdh_ble: BLE event: 0x14.  // BLE_GAP_EVT_SEC_INFO_REQUEST
nrf_sdh_ble: BLE event: 0x1A.  // BLE_GAP_EVT_CONN_SEC_UPDATE
app: Connection secured. Role: 1. conn_handle: 0x0, Procedure: 0
nrf_sdh_ble: BLE event: 0x21.  // BLE_GAP_EVT_SCAN_REQ_REPORT
app: PHY update request.
nrf_sdh_ble: BLE event: 0x22.  // BLE_GAP_EVT_PHY_UPDATE_REQUEST
nrf_sdh_ble: BLE event: 0x23.  // BLE_GAP_EVT_PHY_UPDATE
ble_gatt: Peer on connection 0x0 requested a data length of 251 bytes.
ble_gatt: Updating data length to 27 bytes on connection 0x0.
nrf_sdh_ble: BLE event: 0x24.  // BLE_GAP_EVT_DATA_LENGTH_UPDATE_REQUEST
ble_gatt: Data length updated to 27 on connection 0x0.
ble_gatt: max_rx_octets: 27
ble_gatt: max_tx_octets: 27
ble_gatt: max_rx_time: 328
ble_gatt: max_tx_time: 328
nrf_sdh_ble: BLE event: 0x55. // BLE_GATTS_EVT_EXCHANGE_MTU_REQUEST
ble_gatt: Peer on connection 0x0 requested an ATT MTU of 293 bytes.
ble_gatt: Updating ATT MTU to 23 bytes (desired: 23) on connection 0x0.
nrf_sdh_ble: BLE event: 0x50. // BLE_GATTS_EVT_WRITE
***** DATA GETS WRITTEN TO FLASH HERE via pds_peer_data_store -> fds_record_update *****
nrf_sdh_ble: BLE event: 0x12. // BLE_GAP_EVT_CONN_PARAM_UPDATE

I'm not sure if this event flow is usual for connection to a bonded device? It seems to be a lot of data to be exchanged each time a connection is established!

What I have managed to confirm is that the updated peer data is written to the same FDS file_id and key each time, so at least from that angle the peer data is being stored correctly. So if the above event flow is expected then I guess the issue is not with peer manager but with the garbage collection in FDS?

It looks like others may also have experienced this problem:

devzone.nordicsemi.com/.../

  • Just trying to debug a little further (which is quite hard as adding logging seems to cause thread priority issues!) it seems that the peer database updates that are initiated after garbage collection has started but before it has completed get stuck in a loop calling reattempt_previous_operations() where the call to write_space_reserve() is returning FDS_ERR_NO_SPACE_IN_FLASH which then generates PM_EVT_STORAGE_FULL and it goes around in a cycle. This is despite the fact that all indications are that garbage collection has finished by this point so there should be storage available - as evidenced by the fact that the FDS_EVT_GC has been received and the output of fds_stat() which shows the dirty pages are gone. Perhaps peer manager is getting out of sync with FDS somehow because of the GC operation moving where the record for the peer data is located in flash after the write operation has started, but the code around that is mighty complicated and I am struggling to follow the logic. At this point I think the workaround for me is simply to put a delay in between writing to each CCCD on the iOS side - that is a fudge I'd rather avoid if I can but I need to move past this issue. What I can say is this issue was not present in SDK 12.3.

  • I see.

    The CCCD shall be persistent across connections for bonded devices. See Section 3.3.3.3, Part G, Vol. 3 in the Bluetooth core v5 spec.

    Thank you for the information, I will have to look into this further. Will update here when I have any news.

    Edit 02.02.2018

    Could you try fix this and see if it resolves the issue?fix.rar

  • hello Peter, jumpingfool,

    I have a very similar problem since my migration to SDK14, but without the flash problems.

    <debug> my_app: Advertising....
    <debug> nrf_sdh_ble: BLE event: 0x10. //BLE_GAP_EVT_CONNECTED
    <debug> nrf_sdh_ble: BLE event: 0x23. //BLE_GAP_EVT_DATA_LENGTH_UPDATE_REQUEST
    <debug> ble_gatt: Peer on connection 0x0 requested a data length of 27 bytes.
    <debug> ble_gatt: Updating data length to 27 bytes on connection 0x0.
    <debug> nrf_sdh_ble: BLE event: 0x24. //BLE_GAP_EVT_DATA_LENGTH_UPDATE
    <debug> ble_gatt: Data length updated to 27 on connection 0x0.
    <debug> ble_gatt: max_rx_octets: 27
    <debug> ble_gatt: max_tx_octets: 27
    <debug> ble_gatt: max_rx_time: 328
    <debug> ble_gatt: max_tx_time: 328
    <debug> my_app: ATT MTU exchange completed. central 0x17 peripheral 0x17
    <debug> nrf_sdh_ble: BLE event: 0x12. //BLE_GAP_EVT_CONN_PARAM_UPDATE
    <debug> nrf_sdh_ble: BLE event: 0x12. //BLE_GAP_EVT_CONN_PARAM_UPDATE

    After the GATT negotiation, I receive two or three BLE_GAP_EVT_CONN_PARAM_UPDATE and I'lm stucked here. 

    on SDK12 (-without the gatt), it worked like a charm

    thx

    yacine

  • yacine - this looks like a different issue to me. What does the stack trace look like when your app gets stuck?

  • the last lines are:

    nrf_sdh_ble: BLE event: 0x12. // BLE_GAP_EVT_CONN_PARAM_UPDATE
    nrf_sdh_ble: BLE event: 0x12. // BLE_GAP_EVT_CONN_PARAM_UPDATE

    It stays like this. I use the same client as for SDK12. with SDK12 it connects and I can write/read etc.

Related