BLE Error when calling disconnect inside callback for connected

Hi,

I am developing a peripheral in ble, and I receive the following error when I disconnect a new connection within the connection callback.

bt_conn_disconnect(conn, BT_HCI_ERR_REMOTE_USER_TERM_CONN);

[00:00:07.533,966] <wrn> bt_att: att_get: Not connected
[00:00:07.631,561] <err> bt_conn: bt_conn_send_cb: not connected!

I have also seen this error when I repeatedly connect to the peripheral:

[02:11:23.832,550] <err> bt_conn: bt_conn_send_cb: not connected!

The connected callback simply looks at the connection info (bt_conn_get_info) and then disconnects(bt_conn_disconnect).

The errors are inconsistent.  They don't always happen, and they're not always the same.

Often I see the bt_att issue on the first connection, but not always.

I am repeatedly connecting to the peripheral at a fast rate using the nRF Connect phone app.

Interestingly, when the peripheral disconnects the phone as described, the phone sometimes gets stuck in a re-connection loop.  This is how the subscriptions are coming in so quickly in some cases.

To be clear though, I do frequently see the very simple case described above on the first connection and no looping.

Notably, the return value from the bt_conn_disconnect is 0 (success) on those errors.

Question - what is the right way to use the API to avoid these errors?  I want to be able to disconnect new connections within the connected callback.

Thanks.

Parents
  • If we disregard what errors/warnings you see in the log, is something buggy or not working as expected?

    If everything appears to work as expected, Nordic should consider removing these warnings/errors when the state is "BT_CONN_DISCONNECTING" as it is totally allowed to initiate a disconnection immediately after the connection has been created.

  • Nothing obviously not working, but that's not a confirmation, It's hard for me to tell if there are any bugs because I'm in the initial development phases and haven't really put the system to hard work yet, or prolonged and rigorous exercise.

    My primary concern is corruption of state within the zephyr code which will manifest itself much later in time (and will be very difficult to diagnose).

    it is totally allowed to initiate a disconnection immediately after the connection has been created

    Not clear your specific meaning here.  Do you mean specifically it is ok to call the disconnect function while still within the "connected" callback?  If you do mean that, what makes you sure that's correct?  I have not seen any statements or documentation so far that have asserted that.

    Remember that there are many handlers that subscribe to the BT events, such as the connected event

    My thought is around this statement, that perhaps other subsystems within zephyr haven't yet received the connected callback (if there are any) before I have, and there's some kind of race condition.  Or the order in which the event is distributed is non-deterministic, or something.  But because it's an intermittent problem, it tells me that the same code path isn't being executed each time for equal events, and that's within the zephyr functionality (and wrong).

    Thanks.

  • For follow up questions, please consider creating a new ticket. I will be working on another project in the coming month, so I will not monitor my queue until October. Please feel free to link to this one for background information.

    So in Zephyr, there are a lot of callback handlers in the different modules. One for your main application, one for the flash management (in case you are storing bonding data, and so on), and there are several callback handlers within the Bluetooth stack. When this event occurs, these event handlers are all called, and the order in which they are called is a random order. Therefore errors like this can happen when all the handlers have not yet had time to execute.

    Therefore, my suggestion to you is that you wait until all of these handlers have had the chance to run. The way that I suggested was to use a timer, but if you are not too keen on that, then I suggest you look into the k_work handler. The meaning of this (and the handler) is to handle the disconnect job at a different priority level than the Bluetooth callback itself. So that we know that they are handled in the correct order. Please see the section "Submitting a work item" in the NCS documentation.

    I am not sure whether or not you are familiar with the nRF5 SDK, which is a more "bare metal" SDK. In there, you will not see these kinds of issues, because the SoftDevice would run a bigger part of the Bluetooth Stack, which would handle these events completely before letting the application know that they had happened. However, the complexity of an RTOS means you also need to take these scenarios into consideration. After all, it is still a single core chip running a complex RTOS with many threads running "concurrently", and interrupt priorities still need to be taken into consideration. 

    Best regards,

    Edvin

  • I understand.  Thank you.

    The work queue introduces the issues as well.

    Regarding nRF5 SDK, is that not deprecated in favor of NCS (Nordic Connect SDK)?

Reply Children
No Data
Related