nRF5340 LE disconnection Issue.

Hi,

We are using nRF5340 DK and the nRF Connect SDK Version 2.6.1.

We are experiencing an issue with disconnections. We use bt_conn_ref() when receiving the connected event and bt_conn_unref() when receiving the disconnection event.

However, sometimes after a disconnection, when we retry, we encounter the error: "bt_conn: bt_conn_exists_le: Found valid connection (0x20001a18) with address FF:F5:55:5A:D6:6A (random) in disconnected state".

Why does this issue occur even after the disconnection event has been processed?

If we call bt_conn_unref() twice during the disconnection event as shown in the below code (essentially incrementing once upon connection and decrementing twice upon disconnection), the issue does not occur.

void connected(struct bt_conn *conn, uint8_t conn_err){
	bt_conn_ref(conn);
}
void disconnected(struct bt_conn *conn, uint8_t reason){
 	bt_conn_unref(conn);
 	bt_conn_unref(conn);	
}

Could this behavior be due to an internal counter issue?

Parents
  • Hi,

    I'm not aware of any known issues where the Bluetooth host fails to release its own reference. Are you developing a peripheral or central application?

    Thanks,

    Vidar

  • Hi, 

    Thanks for the response.

    We are developing a mesh network. Our requirement is the one device can connect to three other devices. Using one connection as a peripheral and up to two connections as a central for mesh connections.

  • Hi, I'm also coming across this issue based on ncs v2.7.0
    nRF53 is missing the disconnect event specifically when I seem to have a pending gatt write, and a device disconnects ungracefully.
    Even the supervision timeout does not cause a disconnect event.

    When the ATT timeout hits (30s later), it unsubscribes from the characteristic. If I call disconnect from this context and bt_conn_unref my indexes - internally the conn state stays state Disconnecting, and new scans show the device already having a conn, so a new connection cannot be made. 

    I am wondering if there is any progress here, or if I should work towards the newer bluetooth host controller which does not use the system workqueue. :) 

  • Hi, yes, we have found that sending ATT packets from the 'BT RX' thread context can result in a deadlock if the peer device stops responding, which contradicts the expected behavior described in the API documentation. The deadlock prevents the processing of the disconnect event and the freeing of stack buffers. Unfortunately, we were also able to reproduce this with SDK v2.7.0. 

    The expected behaviour according to the documentation:

    I'm not sure how we plan to address this bug yet, but to eliminate the issue for now, I suggest removing any ATT requests being sent from the 'BT RX' thread (BT callbacks). Alternatively, ensure that ATT requests from this thread are not sent in parallel with other threads and that the number of requests is limited to mitigate the risk of running out of buffers. For instance, if you are performing service discovery, ensure the app waits for the discovery to fully complete before initiating any ATT requests from other threads such as 'main'. The same also applies to the MTU exchange.

    To help locate potentially problematic API calls (i.e. gatt_writes, etc from BT callbacks), you may instrument the code by inserting the following code snippet before the bt_l2cap_create_pdu_timeout() call:

        if (IS_ENABLED(CONFIG_THREAD_NAME)) {
            k_tid_t current_thread = k_current_get();
            const char *threadname = k_thread_name_get(current_thread);
            if ((strcmp(threadname, "BT RX") == 0) && (timeout.ticks == -1)) {
                    LOG_ERR("API violation - likely calling Bluetooth API from callback");
            }
        }
    

    And build the project with CONFIG_LOG=y and CONFIG_THREAD_NAME=y. You can then place a breakpoint at the LOG_ERR() line and use the call stack to determine where the call originated from.

      , I believe the deadlock in your project may caused by the meshDataReceivedHandler() callback. Please try offloading the data sending from this callback to another thread or the workqueue. 

  • Hi,

    Please try offloading the data sending from this callback to another thread or the workqueue. 

    I removed all the data-sending functions from the RX call back function meshDataReceivedHandler() and tested it, but the issue persists.

  • Hi,

    Yes, but what about the other things I mentioned? MTU exchange, etc.

  • Hi,

    I am not using the service discovery. And also am not sending any packets before completing the MTU exchange.

Reply Children
Related