Fast notifications with nRF5340 and nRF Connect SDK

Board: nRF5340dk

SDK version: v2.1.0

My goal is to transmit 240 bytes of data every 5 ms using GATT notifications and 1 Mb PHY.

In the attached sample, I have my dev kit set up to notify on a characteristic every 5 ms. The connection interval is set to 10 ms, the MTU to 247 bytes, and the data length to 251 bytes. My thought is to fit 240 bytes into a single PDU, and send 2 notification packets per connection interval. I also toggle GPIO P0.04 before and after calling `bt_gatt_notify_cb`.

Looking at a logic analyzer trace of P0.04, I can see that while this setup initially works, eventually `bt_gatt_notify_cb` starts blocking for longer than expected. This happens around 14.5 seconds into the attached capture.

Wireshark capture also attached.

Why is `bt_gatt_notify_cb` blocking this long, and do you have any suggestions to improve the setup?

fast_notifications.zip

Parents
  • Hello,

    bt_gatt_notify_cb() is not a blocking call, so it should not be the reason why the P0.04 isn't toggling. 

    Are you sure that bt_gatt_notify_cb() is even called at 14.5 seconds into the trace? The trace doesn't show any notifications being sent around that time. In fact, it doesn't look like any notifications are sent at all in the connection starting at packet no. 7517 (12.89 seconds into the sniffer trace).

    Are you sure bt_gatt_notify_cb() is called at all? And if it is, what does it return? I assume it doesn't return 0, since no notifications are actually sent. 

    Best regards,

    Edvin

  • Hi all.

    For me bt_gatt_notify_cb() does block as well.

    I try to implement a flow control with the help of the "Notification Value callback" func in struct bt_gatt_notify_params. But for that to work bt_gatt_notify_cb() should  return an error code indicating an out of buffer condition. Whereas in my test bt_gatt_notify_cb() only returns 0.

    I use a connection interval of 500ms, the first three calls to bt_gatt_notify_cb() return immediately with return value of 0, but the forth blocks for almost 500ms and then returns also with the value 0.

    I too thought that bt_gatt_notify_cb() is non-blocking but the behavior is blocking!

    Do I miss some Zephyr BLE config option?

    Regards,

    Benno

  • Hello Benno,

    Can you please upload a project that I can use to replicate what you are seeing? What are you connected to? Can I reproduce the issue using two DKs of some sort?

    Best regards,

    Edvin

  • Hello Edvin.

    Unfortunately I have no demo code for you to reproduce.

    But I think I found the culprit: At the end of the  function conn_tx_alloc() in the module subsys/bluetooth/host/conn.c there is the line

    return k_fifo_get(&free_tx, K_FOREVER);

    And because of that bt_gatt_notify_cb() blocks as soon as the free_tx queue is empty. There is one exception when bt_gatt_notify_cb()  gets called from System Workqueue context (e.g. from the "Notification Value callback"), in this case instead of blocking the error code -ENOBUFS is returned.

    The size of the free_tx queue can be set by the Kconfig option CONFIG_BT_CONN_TX_MAX, in my tests this option was set to 3.

    For many applications this behavior may be the desired one, but for my application a call to bt_gatt_notify_cb() must not block under any circumstances. Therefore I introduced a credit counter in my code, so I can track the fill level of the free_tx queue and only call bt_gatt_notify_cb() if the queue is not empty.

    Unfortunately this credit counter is not the real fill level of the free_tx queue and if some other code parts also call bt_gatt_notify_cb(), without respecting the credit counter, my approach could still block.

    Therefore I propose a change in the host stack: There is already a struct bt_gatt_notify_params passed to bt_gatt_notify_cb(), it should be easy to add a field to this struct to indicate if bt_gatt_notify_cb() is allowed to block or not.

    Regards,

    Benno

  • Hello Benno,

    I was not aware that it did that. I agree that it would make sense to have it just return an error instead of waiting for a free buffer. I will report your suggestion (which I agree with) internally. 

    I guess you can modify the behavior of bt_gatt_notify_cb() if you like, to return -ENOBUFS even if it is not called from System Workqueue context, or you can call it from a different thread that doesn't do anything other than that (so that it doesn't matter if it is blocking).

    Best regards,

    Edvin

Reply
  • Hello Benno,

    I was not aware that it did that. I agree that it would make sense to have it just return an error instead of waiting for a free buffer. I will report your suggestion (which I agree with) internally. 

    I guess you can modify the behavior of bt_gatt_notify_cb() if you like, to return -ENOBUFS even if it is not called from System Workqueue context, or you can call it from a different thread that doesn't do anything other than that (so that it doesn't matter if it is blocking).

    Best regards,

    Edvin

Children
  • Hello, Sorry to bother you with this, but I struggle to find the call path from bt_nus_send_cb() to conn_tx_allloc().

    I see that conn_tx_alloc() is used from bt_conn_send_cb(), which again is used from either bt_l2cap_send_cb() or bt_iso_chan_send(). Which one is used in your case? And can you please share the entire callstack from bt_gatt_notify_cb() to conn_tx_alloc()? The reason I am asking is that I need to be able to reproduce this before filing it as a bug report/feature request.

    Best regards,

    Edvin

  • Hi Edvin.

    Here is the call stack (keep in mind that conn_tx_alloc() is only called when func in struct bt_gatt_notify_params is set to a callback):

    bt_gatt_notify_cb(params->func!=NULL) -> gatt_notify(params->func!=NULL) ->

    bt_att_send() -> att_send_process() -> process_queue() -> bt_att_chan_send() -> chan_send() ->

    bt_l2cap_send_cb(cb!=NULL) ->

    bt_conn_send_cb(cb!=NULL) -> conn_tx_alloc() ->

    k_fifo_get(&free_tx, K_FOREVER)

    Regards,

    Benno

  • Thank you very much Benno.

    Are you using the Zephyr Bluetooth Low Energy controller?

    Did you set CONFIG_BT_LL_SPLIT=y in your prj.conf (or any other .conf file that is included in your application)?

    I believe the reason I struggled to find it is that if you use the Nordic Bluetooth Low Energy controller (SoftDevice controller), which is either set by default, or if you use CONFIG_BT_LL_SOFTDEVICE=y, then the implementation of bt_gatt_notify_cb() that will be used is found in bt_rpc_gatt_client.c (ncs\nrf\subsys\bluetooth\rpc\client\bt_rpc_gatt_client.c), and it looks like this:

    int bt_gatt_notify_cb(struct bt_conn *conn,
    		      struct bt_gatt_notify_params *params)
    {
    	struct nrf_rpc_cbor_ctx ctx;
    	int result;
    	size_t scratchpad_size = 0;
    	size_t buffer_size_max = 8;
    
    	buffer_size_max += bt_gatt_notify_params_buf_size(params);
    
    	scratchpad_size += bt_gatt_notify_params_sp_size(params);
    
    	NRF_RPC_CBOR_ALLOC(&bt_rpc_grp, ctx, buffer_size_max);
    	ser_encode_uint(&ctx, scratchpad_size);
    
    	bt_rpc_encode_bt_conn(&ctx, conn);
    	bt_gatt_notify_params_enc(&ctx, params);
    
    	nrf_rpc_cbor_cmd_no_err(&bt_rpc_grp, BT_GATT_NOTIFY_CB_RPC_CMD,
    		&ctx, ser_rsp_decode_i32, &result);
    
    	return result;
    }

    It doesn't call gatt_notify(). Do you see the same blocking behavior if you try to set CONFIG_BT_LL_SOFTDEVICE=y in your prj.conf?

    Best regards,

    Edvin

  • The controller part was not configured by me, so I do not really know what I am talking about. But I was under the impression that I use the Softdevice controller, but via "HCI using RPMsg" (CONFIG_BT_RPMSG=y).

    When I look into the .config file in the app build directory I see no definition of CONFIG_BT_LL_SPLIT nor CONFIG_BT_LL_SOFTDEVICE.

    In the .config file of the hci_rpmsg build directory I see

    # CONFIG_BT_LL_SW_SPLIT is not set

    and

    CONFIG_BT_LL_SOFTDEVICE=y

    Regards,

    Benno

  • Hello Benno,

    I am still struggling to get the same behavior as you do. Can you please send me a project with the configuration you are using that I can use to reproduce the issue? Are you able to reproduce it e.g. with something based on the peripheral_uart sample? 

    BR,
    Edvin

Related