Call to bt_nus_send creates data access violation

Question

Hello,

I'm currently using the nRF Connect SDK toolchain with BM832 modules (nrf52832). After updating to version 2.1.0 I've noticed my bluetooth applications hang whenever the applications try sending data to connected peers.

I've isolated the problem to the bt_nus_send function by modifying the nrf/samples/bluetooth/peripheral_uart code with the following main() function:

void main(void)
{
	int err = 0;

	err = bt_enable(NULL);
	if (err) {
		error();
	}

	LOG_INF("Bluetooth initialized");

	if (IS_ENABLED(CONFIG_SETTINGS)) {
		settings_load();
	}

	err = bt_nus_init(&nus_cb);
	if (err) {
		LOG_ERR("Failed to initialize UART service (err: %d)", err);
		return;
	}

	err = bt_le_adv_start(BT_LE_ADV_CONN, ad, ARRAY_SIZE(ad), sd,
			      ARRAY_SIZE(sd));
	if (err) {
		LOG_ERR("Advertising failed to start (err %d)", err);
		return;
	}

	char wbuf[256];

	for (int iter = 0; true; iter++) {
	  int nbytes = snprintf(wbuf, sizeof(wbuf), "Test %d", iter);
	  if (bt_nus_send(NULL, wbuf, nbytes)) {
	    LOG_WRN("Failed to send data over BLE connection");
	  }

	  k_sleep(K_MSEC(5000));
	}
}

Once a connected peer registers for notifications and the application tries to send data, the following is printed in the RTT viewer:

00> [00:02:50.019,744] <err> os: ***** MPU FAULT *****
00> [00:02:50.019,775] <err> os:   Stacking error (context area might be not valid)
00> [00:02:50.019,805] <err> os:   Data Access Violation
00> [00:02:50.019,805] <err> os:   MMFAR Address: 0x20005e34
00> [00:02:50.019,836] <err> os: r0/a1: 0x7fffd97d r1/a2: 0x0c8883e0 r2/a3: 0xdff8bd5f[0m
00> [00:02:50.019,866] <err> os: r3/a4: 0x91005208 r12/ip: 0xfb9eef9e r14/lr: 0x06d00113
00> [00:02:50.019,866] <err> os: xpsr: 0x21000000
00> [00:02:50.019,897] <err> os: Faulting instruction address (r15/pc): 0x00029ba8
00> [00:02:50.019,897] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
00> [00:02:50.019,927] <err> os: Current thread: 0x200021b0 (unknown)
00> [00:02:50.298,797] <err> os: Halting system

The same code runs as expected on an earlier version of nRF Connect (1.9.1). Looking at the documentation, the behavior of bt_nus_send appears to be the same between these versions.

developer.nordicsemi.com/.../nus.html
developer.nordicsemi.com/.../nus.html

When compiling on v2.1.0, the modified peripheral_uart sample still logs on Connected, Disconnected, and Received data. Is there additional configuration that may be missing for calls to bt_nus_send?

Maria Gilje · Accepted Answer

Hello, 
 The short answer is that the stack size for ble_write_thread needs to be increased. You can check the stack usage with the Thread analyzer . 
 Now to the more detailed answer. 
 00> [00:02:50.019,897] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0 
 This log message shows the the reason for your error is a stack overflow. The stack size is one of the parameters used to define a thread, e.g. in the unmodified peripheral_uart sample, ble_write_thread is defined with 
 K_THREAD_DEFINE(ble_write_thread_id, STACKSIZE, ble_write_thread, NULL, NULL,
 NULL, PRIORITY, 0, 0); 
 STACKSIZE from above is defined to be CONFIG_BT_NUS_THREAD_STACK_SIZE, which is set in the sample's Kconfig or prj_minimal.conf, depending on which configuration file you are building with. The default value 1024 is set in Kconfig, while the value set in prj_minimal.conf is 512. You can increase the stack size by adding a line in your prj.conf where you set CONFIG_BT_NUS_THREAD_STACK_SIZE to a larger value. 
 The new value for the stack size should not be excessively large. You can find a suitable value for the stack size by first setting it higher than you think you will need. Then you can use the Thread analyzer to see how much stack your threads actually use. The Thread analyzer will give a report at set intervals or when the application calls the print or run functions for Thread analyzer. 
 The same code runs as expected on an earlier version of nRF Connect (1.9.1) miket.create said: I noticed that the modified example also works for nRF Connect versions 2.0.0 and 2.0.2, and the data access violation also occurs for 2.1.0-rc1 and rc2. 
 Now, the reason for versions 1.9.1 and 2.0.x working with the modification while 2.1.0 does not likely boils down to changes in the sample which results in a slight increase of stack use for 2.1.0 compared to earlier versions. The extra increase of stack used from your modification will then result in the overflow you experienced. 
 I hope you find this helpful, and don't hesitate to ask if anything is lacking or unclear. 
 Kind Regards, 
 Maria

miket.create · Answer

Modifying prj.conf, I was able to confirm that the stack size was the issue with the application. However, in my modified code above, I believe the call to bt_nus_send() was happening on the main thread, rather than the separate ble_write_thread as defined in the original example. 
 So my application was still crashing when increasing the value of CONFIG_BT_NUS_THREAD_STACK_SIZE, but when I added CONFIG_MAIN_STACK_SIZE to prj.conf the application ran as expected, and messages were sent to the connected peer. 
 Thank you again for your response, this was still very helpful and informative!

Call to bt_nus_send creates data access violation

Top Replies