nRF5340 LE disconnection Issue.

AKV 11 months ago

Hi,

We are using nRF5340 DK and the nRF Connect SDK Version 2.6.1.

We are experiencing an issue with disconnections. We use bt_conn_ref() when receiving the connected event and bt_conn_unref() when receiving the disconnection event.

However, sometimes after a disconnection, when we retry, we encounter the error: "bt_conn: bt_conn_exists_le: Found valid connection (0x20001a18) with address FF:F5:55:5A:D6:6A (random) in disconnected state".

Why does this issue occur even after the disconnection event has been processed?

If we call bt_conn_unref() twice during the disconnection event as shown in the below code (essentially incrementing once upon connection and decrementing twice upon disconnection), the issue does not occur.

void connected(struct bt_conn *conn, uint8_t conn_err){
	bt_conn_ref(conn);
}
void disconnected(struct bt_conn *conn, uint8_t reason){
 	bt_conn_unref(conn);
 	bt_conn_unref(conn);	
}

Could this behavior be due to an internal counter issue?

Parents

0 Vidar Berg 11 months ago

Hi,

I'm not aware of any known issues where the Bluetooth host fails to release its own reference. Are you developing a peripheral or central application?

Thanks,

Vidar
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 AKV 11 months ago in reply to Vidar Berg

Hi,

Thanks for the response.

We are developing a mesh network. Our requirement is the one device can connect to three other devices. Using one connection as a peripheral and up to two connections as a central for mesh connections.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg 9 months ago in reply to AKV

Hi,

Yes, but what about the other things I mentioned? MTU exchange, etc.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 AKV 9 months ago in reply to Vidar Berg

Hi,

I am not using the service discovery. And also am not sending any packets before completing the MTU exchange.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg 9 months ago in reply to AKV

Hi,

Thanks for confirming. Could you please upload your revised code in the private ticket?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 JeffW 9 months ago in reply to Vidar Berg

Hi Vidar,

Appreciate you looking into this for me aswell.

I also removed all gatt sends from the BT RX WQ thread, and am still missing the disconnect event.

If there are any additional suggestions from the R&D team, please pass them along. :)
It looks like upstream Zephyr has a total ble host rewrite -- so that might be in the cards.

--Jeff
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg 9 months ago in reply to JeffW

Hi Jeff,

That is unfortunate. Could you please create a new private ticket and provide a brief summary of your setup? Also, if possible, include a minimal project that will allow us to reproduce the issue you are experiencing. It sounds like the GATT writes may not be the only thing that can lead to deadlock. You can address the ticket to me.

Many of the upstream improvements were also included in v2.7.0. However, the deadlock issue I mentioned earlier has been confirmed in both v2.6 and v2.7.

Best regards,

Vidar
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Vidar Berg 9 months ago in reply to JeffW

Hi Jeff,

That is unfortunate. Could you please create a new private ticket and provide a brief summary of your setup? Also, if possible, include a minimal project that will allow us to reproduce the issue you are experiencing. It sounds like the GATT writes may not be the only thing that can lead to deadlock. You can address the ticket to me.

Many of the upstream improvements were also included in v2.7.0. However, the deadlock issue I mentioned earlier has been confirmed in both v2.6 and v2.7.

Best regards,

Vidar
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 JeffW 9 months ago in reply to Vidar Berg

Vidar,

I worked on making a minimal nRF53dk sample, but could not get the same condition where the system misses a disconnect due to supervision timeout.

We spent Monday/Tuesday getting a build using cherrypicked upstream Zephyr to specifically pull in the Bluetooth HCI driver changes. Unfortunately the same missing disconnect was seen with this newest build.

However, switching to the Split Link Layer BLE Controller completely resolves the issue -- without editing the application code. This leads me to believe that there may be some insight in the network core ipc_radio image configuration.

I am going to look into diffing the .config's from the two builds to try to get more insight which I could then make a ticket for.
We are a multi-connection central, peripheral, using vendor specific channel surveying, and sending nus data -- so maybe there's something being lost in a buffer. I am less familiar with that part of the net-core but doing my best to make smart choices.

Just wanted to give a heads up in the intermediate.
Thanks for the continued support,
--Jeff
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg 9 months ago in reply to JeffW

AKV , I'm working on creating a minimal and more debuggable project to try to reproduce the issue when there are no ATT transactions initiated from the BT RX context. I will let my test run overnight to see if I can reproduce it.

JeffW , Thank you for the update. It’s interesting that the controller selection seems to make a difference, as the issues we’ve observed so far have been in the host. Are both controllers built with the same CONFIG_BT_BUF_ACL_TX_SIZE and CONFIG_BT_BUF_ACL_RX_SIZE setting?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 AKV 9 months ago in reply to Vidar Berg

Vidar Berg

Are there any further updates from the Nordic side regarding both issues—the stack buffer overflow and the missing disconnection?

These are both production-stop bugs for us, and our window to freeze the firmware is fast approaching.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 AKV 9 months ago in reply to AKV

Vidar Berg

We hope to have a fix by tomorrow as we need to test before releasing to production next weekend.

Thanks
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel