bt_nus_send() blocks at the 4th call if central is not in range (aka package is not received)

Hi,

I have a peripheral which is sending data to a central via the NUS service. Everything works as expected until the central is removed (e.g. power failure)

The problem is the time between the central is removed and the peripherals recognises the disconnect (connection timeout). If in that time bt_nus_send() is called more than three times, it blocks. It then releases after the disconnect callback.

I've recognized the error, cause after I received a disconnect callback, i immediately start extended advertising again. The restart of the advertising fails, if the bt_nus_send() is blocking.

I've two questions:

  • Is there a way to get an error of the bt_nus_send() instead of blocking?
  • Why is the restart of the advertising failing; what has this to do with the bt_nus_send()

I've to say, I've found a workaround. I'm using k_work_schedule in the disconnect callback to start advertising 50ms delayed. This works fine. But I will do it the right way, maybe I don't need the delay if I get to the bottom of this.

Attached you'll find a logging of the 4th calling of bt_nus_send(). I've logged the start and end of bt_nus_send() function and where the function blocks

[00:01:05.247,375] <inf> app: Button pressed: 65247, Counter: 12
[00:01:05.247,802] <inf> protocol: enqueu
[00:01:05.248,199] <inf> ble: Dequeue: Type:0x40;Length:10
[00:01:05.248,596] <inf> ble: start bt_nus_send()
[00:01:05.248,962] <dbg> bt_conn: bt_conn_ref: handle 2 ref 2 -> 3

--------- here is where the function blocks -----------------

[00:01:07.894,470] <dbg> bt_sdc_hci_driver: event_packet_process: Event (0x05) len 4
[00:01:07.894,958] <dbg> bt_hci_core: bt_recv: buf 0x20014a28 len 6
[00:01:07.895,416] <dbg> bt_hci_core: hci_disconn_complete_prio: status 0x00 handle 2 reason 0x08
[00:01:07.895,965] <dbg> bt_conn: bt_conn_ref: handle 2 ref 3 -> 4
[00:01:07.896,423] <dbg> bt_conn: bt_conn_set_state: connected -> disconnect-complete
[00:01:07.896,881] <dbg> bt_att: chan_tx_complete: TX Complete chan 0x20009e78 CID 0x0004
[00:01:07.897,430] <dbg> bt_att: chan_tx_complete: TX Complete chan 0x20009e78 CID 0x0004
[00:01:07.897,949] <dbg> bt_att: chan_tx_complete: TX Complete chan 0x20009e78 CID 0x0004
[00:01:07.898,468] <dbg> bt_conn: bt_conn_unref: handle 2 ref 4 -> 3
[00:01:07.898,956] <dbg> bt_hci_core: rx_work_handler: Getting net_buf from queue
[00:01:07.899,414] <dbg> bt_hci_core: rx_work_handler: buf 0x20014a28 type 1 len 6
[00:01:07.899,932] <dbg> bt_hci_core: hci_event: event 0x05
[00:01:07.905,364] <dbg> bt_hci_core: hci_disconn_complete: status 0x00 handle 2 reason 0x08
[00:01:07.905,883] <dbg> bt_conn: bt_conn_ref: handle 2 ref 3 -> 4
[00:01:07.906,341] <dbg> bt_conn: bt_conn_set_state: disconnect-complete -> disconnected
[00:01:07.906,799] <dbg> bt_conn: tx_notify: conn 0x20002498
[00:01:07.907,226] <dbg> bt_conn: bt_conn_unref: handle 2 ref 4 -> 3
[00:01:07.907,714] <dbg> bt_hci_core: process_events: count 3
[00:01:07.918,182] <dbg> bt_hci_core: process_events: ev->state 0
[00:01:07.918,579] <dbg> bt_hci_core: process_events: ev->state 1
[00:01:07.918,975] <dbg> bt_hci_core: process_events: ev->state 0
[00:01:07.919,403] <dbg> bt_conn: bt_conn_prepare_events: 
[00:01:07.919,769] <dbg> bt_hci_core: hci_tx_thread: Calling k_poll with 2 events
[00:01:07.920,257] <dbg> bt_conn: deferred_work: conn 0x20002498
[00:01:07.920,684] <dbg> bt_l2cap: bt_l2cap_chan_del: conn 0x20002498 chan 0x20003d6c
[00:01:07.921,173] <dbg> bt_l2cap: l2cap_disconnected: ch 0x20003d6c cid 0x0005
[00:01:07.921,661] <dbg> bt_l2cap: bt_l2cap_chan_del: conn 0x20002498 chan 0x20003e30
[00:01:07.922,180] <dbg> bt_l2cap: bt_l2cap_chan_del: conn 0x20002498 chan 0x20009e7c
[00:01:07.922,668] <dbg> bt_att: bt_att_disconnected: chan 0x20009e7c cid 0x0004
[00:01:07.923,156] <dbg> bt_att: att_chan_detach: chan 0x20009e78
[00:01:07.933,654] <dbg> bt_gatt: bt_gatt_disconnected: conn 0x20002498
[00:01:07.934,112] <inf> ble: NUS disabled
[00:01:07.934,417] <inf> protocol: enqueu
[00:01:07.934,753] <dbg> bt_gatt: disconnected_cb: ccc 0x200005c4 reseted
[00:01:07.935,211] <dbg> bt_att: bt_att_released: chan 0x20009e78
[00:01:07.935,638] <wrn> ble: Disconnected (reason 8)
[00:01:07.936,035] <dbg> bt_conn: bt_conn_unref: handle 2 ref 3 -> 2
[00:01:07.936,523] <dbg> bt_conn: bt_conn_unref: handle 2 ref 2 -> 1
[00:01:07.936,981] <err> ble: Failed to start ext advertising idx: 0 (err -12)
[00:01:07.943,023] <dbg> bt_gatt: gatt_notify: conn 0x20002498 handle 0x0012
[00:01:07.943,481] <wrn> bt_att: Not connected
[00:01:07.943,847] <dbg> bt_conn: bt_conn_unref: handle 2 ref 1 -> 0
[00:01:07.944,305] <dbg> bt_adv: bt_le_adv_resume: No valid legacy adv
[00:01:07.949,798] <inf> ble: end bt_nus_send()
[00:01:07.950,103] <wrn> ble: Failed to send (err -128)

Thanks a lot and Best Regards,

Phobios

  • Hi Manuel

    bt_nus_send(..) will call the bt_gatt_notify_cb(..) function under the hood, and I believe this function will always be blocking if all the internal buffers are spent. 

    The question is where in your application you are calling bt_nus_spend() from? 

    It can often be useful to handle data transmission from a separate thread in your application, to ensure that even if the call is delayed by a poor link or a disconnect this will not affect the flow of other parts of the system. 

    Another question is if you use the system workqueue actively? 
    The Bluetooth host will also use the system workqueue, and if are using it heavily in the application it can affect the Bluetooth operation. 

    I've to say, I've found a workaround. I'm using k_work_schedule in the disconnect callback to start advertising 50ms delayed. This works fine. But I will do it the right way, maybe I don't need the delay if I get to the bottom of this.

    What if you do a regular (non delayed) work item, will this not work? 

    Either way I agree we should look for a better workaround. 

    Best regards
    Torbjørn

  • Hi Torbjørn,

    The question is where in your application you are calling bt_nus_spend() from? 

    It can often be useful to handle data transmission from a separate thread in your application, to ensure that even if the call is delayed by a poor link or a disconnect this will not affect the flow of other parts of the system. 

    I'm already using a separate thread only for sending. The thread is reading a message queue and then calling bt_nus_send(), so my application is NOT freezing or blocking if bt_nus_send() blocks.

    For the system workqueue; I'm not using it at all

    What if you do a regular (non delayed) work item, will this not work? 

    No, if I start the work queue immediately (with K_NO_WAIT, or k_work_submit) the start of advertising fails.

    I think it has something to do with the fact, that at the same time the disconnect callback occurs, the bt_nus_send() releases and do some stuff (I've marked that in the logs)

    [00:01:07.935,638] <wrn> ble: Disconnected (reason 8)                   -> Disconnect callback
    [00:01:07.936,035] <dbg> bt_conn: bt_conn_unref: handle 2 ref 3 -> 2    -> stuff done by bt_nus_send()
    [00:01:07.936,523] <dbg> bt_conn: bt_conn_unref: handle 2 ref 2 -> 1    -> stuff done by bt_nus_send()
    [00:01:07.936,981] <err> ble: Failed to start ext advertising idx: 0 (err -12) -> start of advertising failed
    [00:01:07.943,023] <dbg> bt_gatt: gatt_notify: conn 0x20002498 handle 0x0012 maybe -> stuff done by bt_nus_send()
    [00:01:07.943,481] <wrn> bt_att: Not connected
    [00:01:07.943,847] <dbg> bt_conn: bt_conn_unref: handle 2 ref 1 -> 0    -> stuff done by bt_nus_send()
    [00:01:07.944,305] <dbg> bt_adv: bt_le_adv_resume: No valid legacy adv

    This log is from no delay work item. You can see, that the bt_nus_send() is not finished completely while in between I'm trying to start the advertising. If I delay this with the workqueue, it works, cause the bt_nus_send() is completed before.

    Can this be the case?

  • Hi Manuel

    It seems the connection is not properly closed when you try to restart the advertising. 

    What happens if you increase the CONFIG_BT_MAX_CONN by 1, will the problem still occur? 

    Would you be able to share the code for the disconnected event, showing how the disconnect is handled? 

    Best regards
    Torbjørn

  • Hi Torbjørn,

    I tried your suggestion and it works! I called the workqueue without delay (see my code snipped) and after the disconnect callback comes the restart of advertising works.

    void ble_thread_t::disconnected(bt_conn *conn, uint8_t reason) {
        LOG_WRN("Disconnected (reason %u)", reason);
        if (ble_thread_t::current_conn) {
    		bt_conn_unref(ble_thread_t::current_conn);
    		ble_thread_t::current_conn = NULL;
    	}
        ble_callbacks->disconnected();
        k_work_schedule(&restart_advertising_work, K_NO_WAIT);
    }

    In the RTT viewer I still get the warning from bt_att

    [00:00:31.306,518] <wrn> bt_att: Not connected

    which I'm not getting if the device disconnects and I don't call bt_nus_send() during the timeout.

    Don't know whether this is now the solution, I think there is still an error or bad timing condition present, which will now work cause of the possible second connection added by CONFIG_BT_MAX_CONN.

    I think If I'm using two active connections and both went out at the same time I have the same problem (Just a thought, my application always just use one connection)

    Best Regrads,

    Manuel

  • Hi Manuel

    Good to hear you can use this as a workaround Slight smile

    I will follow it up with the developers next week and see if they have any other suggestions. 

    Phobios said:
    I think If I'm using two active connections and both went out at the same time I have the same problem (Just a thought, my application always just use one connection)

    True, in this case you would need to use a higher connection limit. It should be sufficient to set BT_MAX_CONN one higher than the maximum number of simultaneous connections, but in this case you need to make sure that you only start advertising once (when the number of connections drop from the max limit, to one below), rather than try to start advertising on every disconnect. 

    Best regards
    Torbjørn

Related