nRF5340 LE disconnection Issue.

Hi,

We are using nRF5340 DK and the nRF Connect SDK Version 2.6.1.

We are experiencing an issue with disconnections. We use bt_conn_ref() when receiving the connected event and bt_conn_unref() when receiving the disconnection event.

However, sometimes after a disconnection, when we retry, we encounter the error: "bt_conn: bt_conn_exists_le: Found valid connection (0x20001a18) with address FF:F5:55:5A:D6:6A (random) in disconnected state".

Why does this issue occur even after the disconnection event has been processed?

If we call bt_conn_unref() twice during the disconnection event as shown in the below code (essentially incrementing once upon connection and decrementing twice upon disconnection), the issue does not occur.

void connected(struct bt_conn *conn, uint8_t conn_err){
	bt_conn_ref(conn);
}
void disconnected(struct bt_conn *conn, uint8_t reason){
 	bt_conn_unref(conn);
 	bt_conn_unref(conn);	
}

Could this behavior be due to an internal counter issue?

Parents Reply
  • Hi Jeff,

    That is unfortunate. Could you please create a new private ticket and provide a brief summary of your setup? Also, if possible, include a minimal project that will allow us to reproduce the issue you are experiencing. It sounds like the GATT writes may not be the only thing that can lead to deadlock. You can address the ticket to me.

    Many of the upstream improvements were also included in v2.7.0. However, the deadlock issue I mentioned earlier has been confirmed in both v2.6 and v2.7.

    Best regards,

    Vidar

Children
  • Vidar,

    I worked on making a minimal nRF53dk sample, but could not get the same condition where the system misses a disconnect due to supervision timeout.

    We spent Monday/Tuesday getting a build using cherrypicked upstream Zephyr to specifically pull in the Bluetooth HCI driver changes. Unfortunately the same missing disconnect was seen with this newest build.

    However, switching to the Split Link Layer BLE Controller completely resolves the issue -- without editing the application code. This leads me to believe that there may be some insight in the network core ipc_radio image configuration.

    I am going to look into diffing the .config's from the two builds to try to get more insight which I could then make a ticket for.
    We are a multi-connection central, peripheral, using vendor specific channel surveying, and sending nus data -- so maybe there's something being lost in a buffer. I am less familiar with that part of the net-core but doing my best to make smart choices.

    Just wanted to give a heads up in the intermediate.
    Thanks for the continued support,
    --Jeff

  •   ,  I'm working on creating a minimal and more debuggable project to try to reproduce the issue when there are no ATT transactions initiated from the BT RX context. I will let my test run overnight to see if I can reproduce it.

     , Thank you for the update. It’s interesting that the controller selection seems to make a difference, as the issues we’ve observed so far have been in the host. Are both controllers built with the same CONFIG_BT_BUF_ACL_TX_SIZE and CONFIG_BT_BUF_ACL_RX_SIZE setting?

  •  

    Are there any further updates from the Nordic side regarding both issues—the stack buffer overflow and the missing disconnection?

    These are both production-stop bugs for us, and our window to freeze the firmware is fast approaching.

  •   

    We hope to have a fix by tomorrow as we need to test before releasing to production next weekend.

    Thanks 

Related