This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE multilink ATT timeout problem occurs when the number of connections increases

Recently I am using BLE in Zephyr to implement the multilink central function
However, none of the references currently have a similar function, and there is also the nordic zephyr code
https://github.com/nrfconnect/sdk-nrf <-There are only examples of multilink peripherals in this link
So I wrote one to implement it. The goal I want to achieve is one central to 30 peripherals
The current connection with 4 peripherals is fully stable, but when I increase the connected peripherals, the following error will appear

<err> bt_attLATT Timeout
<wrn> bt_att: No ATT channel for MTU 5
<wrn> bt_att: No pending ATT request

The picture below shows the error when connecting 20 peripherals

At present, I know that modifying the Interval connection will indeed improve, but only to make <err> bt_attLATT Timeout happen later.

How can I avoid this problem so that I can connect 30 peripherals stably?

Below is my code
Or you can go to this page to download https://github.com/mfinmuch/zephyr-ble-mulrilink-test

3225.multilink central.rar

Thanks,

Poyi

Parents

0 Hung Bui over 4 years ago

Hi Hmw,

Could you describe a little bit more about your application ?
What's the connection interval you used ? What data do you send/receive , which direction ?

You actually got a MPU fault, suggesting there could be a stack overflow. Please try increasing this: CONFIG_BT_RX_STACK_SIZE. Try double the stack size, maybe to 4096.
Also, try to increase these if it's already as follow:

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_MAIN_STACK_SIZE=2048

Are you using Zephyr stack or Nordic stack ?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 hmw over 4 years ago in reply to Hung Bui

Currently I set the interval as follows

#define MIN_CONNECTION_INTERVAL         72     
#define MAX_CONNECTION_INTERVAL         72    
#define SLAVE_LATENCY                   0    
#define SUPERVISION_TIMEOUT             50

I use the BLE example central_hr and peripheral provided by Zephyr to make changes, it should be Zephyr stack

My central will look for peripherals first
Establish a connection, peripherals will send data to central(use notify), and central will return data to it (use bt_gatt_write)

And the data is just a simple sequence

I thought only the following parameters need to be set

CONFIG_HEAP_MEM_POOL_SIZE=4096
CONFIG_BT_RX_BUF_LEN=258
CONFIG_BT_ATT_TX_MAX=10
CONFIG_BT_ATT_PREPARE_COUNT=10
CONFIG_BT_CONN_TX_MAX=18
CONFIG_BT_L2CAP_TX_BUF_COUNT=18
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_L2CAP_RX_MTU=247
CONFIG_BT_L2CAP_DYNAMIC_CHANNEL=y
CONFIG_BT_CTLR_PHY_2M=y
CONFIG_BT_CTLR_RX_BUFFERS=18
CONFIG_BT_CTLR_TX_BUFFERS=18
CONFIG_BT_CTLR_TX_BUFFER_SIZE=251
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251

I will try the three parameters you suggested again

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_MAIN_STACK_SIZE=2048
CONFIG_BT_RX_STACK_SIZE.

Besides, I want to ask
When the central receives the notification from the peripheral, the central will go to the notify_func, I print the data I received in the notify_func and use bt_gatt_write to return the information
Is it illegal to do bt_gatt_write in this function? as follow

static uint8_t notify_func(struct bt_conn *conn,
			   struct bt_gatt_subscribe_params *params,
			   const void *data, uint16_t length)
{
	uint32_t mantissa;
	int8_t exponent;
	int err;
	int a=0,b=0;
	int unsub_conn = 0;

	start_time = k_uptime_get_32();

	if (!data) {
		unsub_conn = bt_conn_index(conn);
		printk("[UNSUBSCRIBED] %d %d %d\n",unsub_conn,params->value_handle,subscribe_params[unsub_conn].value_handle);
		subscribe_params[unsub_conn].value_handle=0U;
		 is_connecting = false;
		return BT_GATT_ITER_CONTINUE;
	}
	
	printk("index %d recv %u %u %u %u %u %u %u.\n", bt_conn_index(conn),((uint8_t *)data)[0],((uint8_t *)data)[1],((uint8_t *)data)[2],((uint8_t *)data)[3],((uint8_t *)data)[4],((uint8_t *)data)[5],((uint8_t *)data)[6]);
	
	gatt_write_buf[0] = ((uint8_t *)data)[0];
	gatt_write_buf[1] = count;
	gatt_write_buf[2] =88;
	gatt_write_buf[3] =99;	

	count++;

	write_params[bt_conn_index(conn)].data = gatt_write_buf;
	write_params[bt_conn_index(conn)].length = 8;
	write_params[bt_conn_index(conn)].handle = service_handle;
	write_params[bt_conn_index(conn)].offset = 0;
	write_params[bt_conn_index(conn)].func = write_func;

	err = bt_gatt_write(conn, &write_params[bt_conn_index(conn)]);
	
	if (err) {
		printk("Write failed (err %d)\r\n", err);
	} 
	return BT_GATT_ITER_CONTINUE;
}

Or is it normal to do bt_gatt_write in the while loop of main?

Thanks,

Poyi

0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,

Please clarify do you still see the error when you change these to:

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_BT_RX_STACK_SIZE=4096

But this suggestion was based on the MPU (memory protection unit) fault, it suggested there was an issue with the memory/stack.

You may think of increasing CONFIG_BT_L2CAP_TX_BUF_COUNT. If you have 20 peripheral and if you try to send a write command to all of the 20 peripherals you may pass the number of 18 tx buffer .

Your issue seems to be similar to this on going report: https://github.com/zephyrproject-rtos/zephyr/issues/30378

Could be the same issue.

Do you have a sniffer ? If you can sniff the whole activity we can check which exact action caused the timeout.

My concern is that when you have 20 connection, and each connection has an interval of 90ms there are only 4.5ms for each connection and if the scheduler couldn't schedule all the connections good enough you will have packet drop. I would suggest to change the device that sends packet every one second to switch to 1 second interval instead of 90ms. Or at least change them to 500ms interval.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmw over 4 years ago in reply to Hung Bui
Hello

I don't quite understand what you mean

Hung Bui said:
you may pass the number of 18 tx buffer .

It is true that my problem is very similar to it, but no one has provided a good solution yet, which troubles me a lot.
The problem now is that I keep sending, and Tx Buffer Overflow will appear in the central after a while.
<err> bt_ctlr_hci: Tx Buffer Overflow

When this error occurs, my central will continue to send data, but some peripherals will not receive the bt_gatt_write_without_response sent by central, but they will still stay connected to central

In this case, it seems that there is no clear solution

Is 500ms your estimated value, or is there an algorithm?

I will try to look at 500ms and 1s, but in my situation, if I can transmit and receive data as quickly as possible, the better

Thanks,

Poyi
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,

I meant if you queue a write command for every single connection on every connection event, you will need a bigger buffer than 18 as you have 20 connection concurrently. However, this shouldn't cause tx overflow as far as I know. You may receive an error of full buffer when you try to queue write command when buffer is full.

Please note that our softdevice controller currently only officially support maximum 20 concurrent connections at a time. See here. I assume you plan to use our Nordic LL ? not the Zephyr LL ? From what I can tell when compiling locally here you are using our Nordic LL.

If you are sending your data at 1s interval I don't see much point of having shorter interval, please describe your application in more detail.

Please try again with CONFIG_THREAD_NAME=y so that the thread got overflow be printed out.

But could you clarify that you no longer see MPU fault and now only see TX Overflow ? On how many peripherals connected do you start seeing that ?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmw over 4 years ago in reply to Hung Bui
Hung Bui said:
you will need a bigger buffer than 18

In zephyr, I can only set it to 18 at most, and it cannot be set even larger than this. If I set more than 18, I will not be able to compile

Hung Bui said:
You may receive an error of full buffer when you try to queue write command when buffer is full.

The error is in \zephyr\subsys\bluetooth\controller\hci\hci.c

Shown in

node_tx = ll_tx_mem_acquire(); if (!node_tx) { BT_ERR("Tx Buffer Overflow"); data_buf_overflow(evt); return -ENOBUFS; }

I am not sure which LL I am using, I did not set BT_LL_SOFTDEVICE in my prj.conf
I downloaded the sample from Zephyr's official website to make changes. Website:https://github.com/zephyrproject-rtos/zephyr

I am currently connected to 15 connections, there will be TX overflow, is there any good way, or what can explain the reason for this error

Thanks,

Poyi
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,

Did you use our NCS repository or you are using Zephyr natively ?
You can check inside autoconf.h to see if you have CONFIG_BT_LL_SOFTDEVICE 1 or not.

By default if you use NCS and if you don't have CONFIG_BT_LL_SW_SPLIT =y then it will be our softdevice LL used.

Our team is working with the case on github that I pointed. I will check with them and get back to you if we find anything.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,

Did you use our NCS repository or you are using Zephyr natively ?
You can check inside autoconf.h to see if you have CONFIG_BT_LL_SOFTDEVICE 1 or not.

By default if you use NCS and if you don't have CONFIG_BT_LL_SW_SPLIT =y then it will be our softdevice LL used.

Our team is working with the case on github that I pointed. I will check with them and get back to you if we find anything.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 hmw over 4 years ago in reply to Hung Bui
Hello,
I think I use Zephyr natively
I have set this in my autocpnf.h
#define CONFIG_BT_LL_SW_SPLIT 1

Also in my autoconf.h did not see CONFIG_BT_LL_SOFTDEVICE

I am glad to hear this news, if there is any improvement, please let me know

Thanks,

Poyi
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,

I would suggest to try testing using our Softdevice LL.
Please note that in this support channel we focus on Nordic's products so if you plan to use Zephyr LL, you may want to post your question on the Zephyr Github channel.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel