This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Bug in peer manager in SDK 14.2

I have recently ported my nRF52832 application from SDK 12.3 to SDK 14.2 and have been disappointed to find that the preripheral is locking up after a period in operation.

In order to track down the issue I left the debugger running until the application froze (a few hours) and found that peer manager seems to be caught in a infinite loop trying to clear some flash. Exactly what it is trying to store is unclear - I only have one central bonded.

I have since built a short central application that repeatedly connects and disconnects from the peripheral at 2 second intervals which has made this bug repeatable - here is an example stack trace when I hit pause in the debugger after the application has frozen:

image description

I have tried to work through the peer manager code and it appears that pds_peer_data_store attempts to use FDS to write to flash but gets FDS_ERR_NO_SPACE_IN_FLASH. FDS then attempts to do come garbage collection after which the peer manager attempts again to write to flash but again gets the no space error. This continues indefinitely.

Is this a known issue and if so is there any workaround recommended?

Update: I have tracked the memory leak to the function gscm_local_db_cache_update:

image description

Prior to this function being executed I get the following debug output to the console:

app: Connected to a previously bonded device.
app: Connected
app: Connection secured. Role: 1. conn_handle: 0x0, Procedure: 0
ble_gatt: Peer on connection 0x0 requested an ATT MTU of 185 bytes.
ble_gatt: Updating ATT MTU to 23 bytes (desired: 23) on connection 0x0.

I'm not sure what is causing this MTU negotiation but it appears that the SDK is trying to cache the connection parameter. Rather than updating the existing record for the peer it is creating a new one, so after a few connect/disconnect cycles the flash fills up and it gets stuck in the garbage collecting loop.

So I guess my question is:

  • What might be causing this MTU negotiation from the peer in the first place?

  • What needs to be done to fix the memory leak?

FURTHER UPDATE: I have added some additional logging to give visibility of the event flow:

nrf_sdh_ble: BLE event: 0x10.  // BLE_GAP_EVT_CONNECTED
app: Connected to a previously bonded device.
app: Connected
nrf_sdh_ble: BLE event: 0x14.  // BLE_GAP_EVT_SEC_INFO_REQUEST
nrf_sdh_ble: BLE event: 0x1A.  // BLE_GAP_EVT_CONN_SEC_UPDATE
app: Connection secured. Role: 1. conn_handle: 0x0, Procedure: 0
nrf_sdh_ble: BLE event: 0x21.  // BLE_GAP_EVT_SCAN_REQ_REPORT
app: PHY update request.
nrf_sdh_ble: BLE event: 0x22.  // BLE_GAP_EVT_PHY_UPDATE_REQUEST
nrf_sdh_ble: BLE event: 0x23.  // BLE_GAP_EVT_PHY_UPDATE
ble_gatt: Peer on connection 0x0 requested a data length of 251 bytes.
ble_gatt: Updating data length to 27 bytes on connection 0x0.
nrf_sdh_ble: BLE event: 0x24.  // BLE_GAP_EVT_DATA_LENGTH_UPDATE_REQUEST
ble_gatt: Data length updated to 27 on connection 0x0.
ble_gatt: max_rx_octets: 27
ble_gatt: max_tx_octets: 27
ble_gatt: max_rx_time: 328
ble_gatt: max_tx_time: 328
nrf_sdh_ble: BLE event: 0x55. // BLE_GATTS_EVT_EXCHANGE_MTU_REQUEST
ble_gatt: Peer on connection 0x0 requested an ATT MTU of 293 bytes.
ble_gatt: Updating ATT MTU to 23 bytes (desired: 23) on connection 0x0.
nrf_sdh_ble: BLE event: 0x50. // BLE_GATTS_EVT_WRITE
***** DATA GETS WRITTEN TO FLASH HERE via pds_peer_data_store -> fds_record_update *****
nrf_sdh_ble: BLE event: 0x12. // BLE_GAP_EVT_CONN_PARAM_UPDATE

I'm not sure if this event flow is usual for connection to a bonded device? It seems to be a lot of data to be exchanged each time a connection is established!

What I have managed to confirm is that the updated peer data is written to the same FDS file_id and key each time, so at least from that angle the peer data is being stored correctly. So if the above event flow is expected then I guess the issue is not with peer manager but with the garbage collection in FDS?

It looks like others may also have experienced this problem:

devzone.nordicsemi.com/.../

Parents
  • You say GC is attempted? Could you elaborate?

    Do you get the PM_EVT_STORAGE_FULL event in pm_evt_handler()? If you do, do fds_gc() return 0x00000000 (NRF_SUCCESS)? Do you get the PM_EVT_FLASH_GARBAGE_COLLECTED event in pm_evt_handler()? Or the FDS_EVT_GC event in fds_evt_handler()?

    Could you run fds_stat() and see what it returns?

    The ATT MTU request is probably done by the GATT module, I think you can just comment out gatt_init() in main() and this will not happen.

    I'm guessing that the BLE_GATTS_EVT_WRITE is a write to a CCCD? Every time a CCCD is written to, the PM_PEER_DATA_ID_GATT_LOCAL_V2 record will be updated. This means that a new record will be written, and the old should be invalidated. Then when doing GC this invalidated record should be collected.

    Would it be possible for you to upload the complete projects so I can test them here?

Reply
  • You say GC is attempted? Could you elaborate?

    Do you get the PM_EVT_STORAGE_FULL event in pm_evt_handler()? If you do, do fds_gc() return 0x00000000 (NRF_SUCCESS)? Do you get the PM_EVT_FLASH_GARBAGE_COLLECTED event in pm_evt_handler()? Or the FDS_EVT_GC event in fds_evt_handler()?

    Could you run fds_stat() and see what it returns?

    The ATT MTU request is probably done by the GATT module, I think you can just comment out gatt_init() in main() and this will not happen.

    I'm guessing that the BLE_GATTS_EVT_WRITE is a write to a CCCD? Every time a CCCD is written to, the PM_PEER_DATA_ID_GATT_LOCAL_V2 record will be updated. This means that a new record will be written, and the old should be invalidated. Then when doing GC this invalidated record should be collected.

    Would it be possible for you to upload the complete projects so I can test them here?

Children
No Data
Related