This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE multilink ATT timeout problem occurs when the number of connections increases

Recently I am using BLE in Zephyr to implement the multilink central function
However, none of the references currently have a similar function, and there is also the nordic zephyr code
https://github.com/nrfconnect/sdk-nrf <-There are only examples of multilink peripherals in this link
So I wrote one to implement it. The goal I want to achieve is one central to 30 peripherals
The current connection with 4 peripherals is fully stable, but when I increase the connected peripherals, the following error will appear

<err> bt_attLATT Timeout
<wrn> bt_att: No ATT channel for MTU 5
<wrn> bt_att: No pending ATT request

The picture below shows the error when connecting 20 peripherals

At present, I know that modifying the Interval connection will indeed improve, but only to make <err> bt_attLATT Timeout happen later.

How can I avoid this problem so that I can connect 30 peripherals stably?

Below is my code
Or you can go to this page to download https://github.com/mfinmuch/zephyr-ble-mulrilink-test

3225.multilink central.rar

Thanks,

Poyi

Parents

0 Hung Bui over 4 years ago

Hi Hmw,

Could you describe a little bit more about your application ?
What's the connection interval you used ? What data do you send/receive , which direction ?

You actually got a MPU fault, suggesting there could be a stack overflow. Please try increasing this: CONFIG_BT_RX_STACK_SIZE. Try double the stack size, maybe to 4096.
Also, try to increase these if it's already as follow:

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_MAIN_STACK_SIZE=2048

Are you using Zephyr stack or Nordic stack ?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 hmw over 4 years ago in reply to Hung Bui

Currently I set the interval as follows

#define MIN_CONNECTION_INTERVAL         72     
#define MAX_CONNECTION_INTERVAL         72    
#define SLAVE_LATENCY                   0    
#define SUPERVISION_TIMEOUT             50

I use the BLE example central_hr and peripheral provided by Zephyr to make changes, it should be Zephyr stack

My central will look for peripherals first
Establish a connection, peripherals will send data to central(use notify), and central will return data to it (use bt_gatt_write)

And the data is just a simple sequence

I thought only the following parameters need to be set

CONFIG_HEAP_MEM_POOL_SIZE=4096
CONFIG_BT_RX_BUF_LEN=258
CONFIG_BT_ATT_TX_MAX=10
CONFIG_BT_ATT_PREPARE_COUNT=10
CONFIG_BT_CONN_TX_MAX=18
CONFIG_BT_L2CAP_TX_BUF_COUNT=18
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_L2CAP_RX_MTU=247
CONFIG_BT_L2CAP_DYNAMIC_CHANNEL=y
CONFIG_BT_CTLR_PHY_2M=y
CONFIG_BT_CTLR_RX_BUFFERS=18
CONFIG_BT_CTLR_TX_BUFFERS=18
CONFIG_BT_CTLR_TX_BUFFER_SIZE=251
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251

I will try the three parameters you suggested again

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_MAIN_STACK_SIZE=2048
CONFIG_BT_RX_STACK_SIZE.

Besides, I want to ask
When the central receives the notification from the peripheral, the central will go to the notify_func, I print the data I received in the notify_func and use bt_gatt_write to return the information
Is it illegal to do bt_gatt_write in this function? as follow

static uint8_t notify_func(struct bt_conn *conn,
			   struct bt_gatt_subscribe_params *params,
			   const void *data, uint16_t length)
{
	uint32_t mantissa;
	int8_t exponent;
	int err;
	int a=0,b=0;
	int unsub_conn = 0;

	start_time = k_uptime_get_32();

	if (!data) {
		unsub_conn = bt_conn_index(conn);
		printk("[UNSUBSCRIBED] %d %d %d\n",unsub_conn,params->value_handle,subscribe_params[unsub_conn].value_handle);
		subscribe_params[unsub_conn].value_handle=0U;
		 is_connecting = false;
		return BT_GATT_ITER_CONTINUE;
	}
	
	printk("index %d recv %u %u %u %u %u %u %u.\n", bt_conn_index(conn),((uint8_t *)data)[0],((uint8_t *)data)[1],((uint8_t *)data)[2],((uint8_t *)data)[3],((uint8_t *)data)[4],((uint8_t *)data)[5],((uint8_t *)data)[6]);
	
	gatt_write_buf[0] = ((uint8_t *)data)[0];
	gatt_write_buf[1] = count;
	gatt_write_buf[2] =88;
	gatt_write_buf[3] =99;	

	count++;

	write_params[bt_conn_index(conn)].data = gatt_write_buf;
	write_params[bt_conn_index(conn)].length = 8;
	write_params[bt_conn_index(conn)].handle = service_handle;
	write_params[bt_conn_index(conn)].offset = 0;
	write_params[bt_conn_index(conn)].func = write_func;

	err = bt_gatt_write(conn, &write_params[bt_conn_index(conn)]);
	
	if (err) {
		printk("Write failed (err %d)\r\n", err);
	} 
	return BT_GATT_ITER_CONTINUE;
}

Or is it normal to do bt_gatt_write in the while loop of main?

Thanks,

Poyi

0 Hung Bui over 4 years ago in reply to hmw

It will the actual byte you send so if you set the length to 8 the actual number of byte should be 15 bytes (7 bytes overhead). If you don't plan to send large packet, you can reduce the MTU size and the Datalength max. This is given the notification from the peers also don't have large packet size.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 hmw over 4 years ago in reply to Hung Bui

Hello
I modified the following parameters to

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_MAIN_STACK_SIZE=4096
CONFIG_BT_RX_STACK_SIZE=4096

Then an error appeared, as shown in the figure below. What is the cause of this?

<wrn> bt_conn: Disconnected while allocating context

After trying to change these three parameters to 8192, the above error did not appear.

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=8192
CONFIG_MAIN_STACK_SIZE=8192
CONFIG_BT_RX_STACK_SIZE=8192

But there is a Tx Buffer Overflow error, as shown below

[00:36:55.371,032] <err> bt_ctlr_hci: Tx Buffer Overflow
[00:36:55.371,063] <wrn> bt_hci_core: Data buffer overflow (link type 0x01)
[00:36:55.371,063] <err> bt_conn: Unable to send to driver (err -55)

However, it didn’t stop my data transmission, but it looks like it’s not right
For my TX Buffer related settings, I set the official maximum value, as follows

CONFIG_BT_CONN_TX_MAX=18
CONFIG_BT_L2CAP_TX_BUF_COUNT=18
CONFIG_BT_CTLR_RX_BUFFERS=18
CONFIG_BT_CTLR_TX_BUFFERS=18
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_L2CAP_RX_MTU=247
CONFIG_BT_CTLR_TX_BUFFER_SIZE=251
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_BT_RX_BUF_LEN=258

How can I modify it to prevent the Data buffer overflow error from happening again?

Thanks.

Poyi

0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,
Please try to increase one stack size at a time.
I would suggest to increase CONFIG_BT_RX_STACK_SIZE first. The current default value in your project as 1024, correct ? Please try to increase it to 2048 first.

The next you want to try is CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE. Please try with 4096. I don't think the CONFIG_MAIN_STACK_SIZE need to be increased.
Please make sure you put all the BLE API calls in to a work queue, or in main thread.

If you don't plan send large data packet, please set the CONFIG_BT_L2CAP_RX_MTU , CONFIG_BT_CTLR_DATA_LENGTH_MAX to match with your packet size.

Please let me know until how many peripherals do you see the "ATT Timeout" error ? What's the data traffic ? How often the peripheral send notification? What's the data size of the notification?

Please try testing with larger connection interval and with slave latency.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmw over 4 years ago in reply to Hung Bui
You mean, I set
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096

Come again
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096 CONFIG_BT_RX_STACK_SIZE=4096

Is that so?

In addition, currently the smallest can only be set like this
CONFIG_BT_CTLR_DATA_LENGTH_MAX=32 CONFIG_BT_L2CAP_TX_MTU=65 CONFIG_BT_L2CAP_RX_MTU=65

I don’t know if it matches my packet size

Currently my central only uses bt_gatt_write_without_response
And bt_gatt_write_without_response is in cmd_write
The method to change to work_queue is as follows
void write_work_handler(struct k_work *work) { int err; printk("test now_conn %d\n",now_conn); cmd_write(service_handle, now_conn); To } K_WORK_DEFINE(write_work, write_work_handler);

ATT Timeout appears after a while after connecting with all peripherals, and the time of appearance is random

I don’t really understand the data traffic you mentioned. I thought that after cnetral announced the interval connection, the peripherals would automatically select an interval value to determine how long it would take to send it.

Five of my peripherals will be sent every 100ms, the rest will be sent every 1 second, and 8bytes will be sent every time.
static uint8_t test[8]; rc = bt_gatt_notify(NULL, &hrs_svc.attrs[1], &test, sizeof(test));

test[8] is the data that my peripheral wants to send

and also
Is there any good solution to the Tx Buffer Overflow error mentioned above?
[00:36:55.371,032] <err> bt_ctlr_hci: Tx Buffer Overflow [00:36:55.371,063] <wrn> bt_hci_core: Data buffer overflow (link type 0x01) [00:36:55.371,063] <err> bt_conn: Unable to send to driver (err -55)

Or is it related to my CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE and CONFIG_BT_RX_STACK_SIZE?

Thanks,

Poyi
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,

Please clarify do you still see the error when you change these to:

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_BT_RX_STACK_SIZE=4096

But this suggestion was based on the MPU (memory protection unit) fault, it suggested there was an issue with the memory/stack.

You may think of increasing CONFIG_BT_L2CAP_TX_BUF_COUNT. If you have 20 peripheral and if you try to send a write command to all of the 20 peripherals you may pass the number of 18 tx buffer .

Your issue seems to be similar to this on going report: https://github.com/zephyrproject-rtos/zephyr/issues/30378

Could be the same issue.

Do you have a sniffer ? If you can sniff the whole activity we can check which exact action caused the timeout.

My concern is that when you have 20 connection, and each connection has an interval of 90ms there are only 4.5ms for each connection and if the scheduler couldn't schedule all the connections good enough you will have packet drop. I would suggest to change the device that sends packet every one second to switch to 1 second interval instead of 90ms. Or at least change them to 500ms interval.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,

Please clarify do you still see the error when you change these to:

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_BT_RX_STACK_SIZE=4096

But this suggestion was based on the MPU (memory protection unit) fault, it suggested there was an issue with the memory/stack.

You may think of increasing CONFIG_BT_L2CAP_TX_BUF_COUNT. If you have 20 peripheral and if you try to send a write command to all of the 20 peripherals you may pass the number of 18 tx buffer .

Your issue seems to be similar to this on going report: https://github.com/zephyrproject-rtos/zephyr/issues/30378

Could be the same issue.

Do you have a sniffer ? If you can sniff the whole activity we can check which exact action caused the timeout.

My concern is that when you have 20 connection, and each connection has an interval of 90ms there are only 4.5ms for each connection and if the scheduler couldn't schedule all the connections good enough you will have packet drop. I would suggest to change the device that sends packet every one second to switch to 1 second interval instead of 90ms. Or at least change them to 500ms interval.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 hmw over 4 years ago in reply to Hung Bui
Hello

I don't quite understand what you mean

Hung Bui said:
you may pass the number of 18 tx buffer .

It is true that my problem is very similar to it, but no one has provided a good solution yet, which troubles me a lot.
The problem now is that I keep sending, and Tx Buffer Overflow will appear in the central after a while.
<err> bt_ctlr_hci: Tx Buffer Overflow

When this error occurs, my central will continue to send data, but some peripherals will not receive the bt_gatt_write_without_response sent by central, but they will still stay connected to central

In this case, it seems that there is no clear solution

Is 500ms your estimated value, or is there an algorithm?

I will try to look at 500ms and 1s, but in my situation, if I can transmit and receive data as quickly as possible, the better

Thanks,

Poyi
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,

I meant if you queue a write command for every single connection on every connection event, you will need a bigger buffer than 18 as you have 20 connection concurrently. However, this shouldn't cause tx overflow as far as I know. You may receive an error of full buffer when you try to queue write command when buffer is full.

Please note that our softdevice controller currently only officially support maximum 20 concurrent connections at a time. See here. I assume you plan to use our Nordic LL ? not the Zephyr LL ? From what I can tell when compiling locally here you are using our Nordic LL.

If you are sending your data at 1s interval I don't see much point of having shorter interval, please describe your application in more detail.

Please try again with CONFIG_THREAD_NAME=y so that the thread got overflow be printed out.

But could you clarify that you no longer see MPU fault and now only see TX Overflow ? On how many peripherals connected do you start seeing that ?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmw over 4 years ago in reply to Hung Bui
Hung Bui said:
you will need a bigger buffer than 18

In zephyr, I can only set it to 18 at most, and it cannot be set even larger than this. If I set more than 18, I will not be able to compile

Hung Bui said:
You may receive an error of full buffer when you try to queue write command when buffer is full.

The error is in \zephyr\subsys\bluetooth\controller\hci\hci.c

Shown in

node_tx = ll_tx_mem_acquire(); if (!node_tx) { BT_ERR("Tx Buffer Overflow"); data_buf_overflow(evt); return -ENOBUFS; }

I am not sure which LL I am using, I did not set BT_LL_SOFTDEVICE in my prj.conf
I downloaded the sample from Zephyr's official website to make changes. Website:https://github.com/zephyrproject-rtos/zephyr

I am currently connected to 15 connections, there will be TX overflow, is there any good way, or what can explain the reason for this error

Thanks,

Poyi
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui over 4 years ago in reply to hmw

Hi Poyi,

Did you use our NCS repository or you are using Zephyr natively ?
You can check inside autoconf.h to see if you have CONFIG_BT_LL_SOFTDEVICE 1 or not.

By default if you use NCS and if you don't have CONFIG_BT_LL_SW_SPLIT =y then it will be our softdevice LL used.

Our team is working with the case on github that I pointed. I will check with them and get back to you if we find anything.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmw over 4 years ago in reply to Hung Bui
Hello,
I think I use Zephyr natively
I have set this in my autocpnf.h
#define CONFIG_BT_LL_SW_SPLIT 1

Also in my autoconf.h did not see CONFIG_BT_LL_SOFTDEVICE

I am glad to hear this news, if there is any improvement, please let me know

Thanks,

Poyi
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel