Assert fires when disconnecting while data is transmitted via L2CAP

I have a nRF5340 using nrf connect v3.7.99-nc1 that is in a peripheral role. I establish a l2cap connection to the peripheral, to request a larger amount of data. That works quite ok until I close that connection from the receiving side while still some l2cap SDUs are on the flight. An asserts fires from within a thread (id 12) that I did not spawned:

(gdb) info thread
  Id   Target Id                                      Frame 
  2    Thread 536977456 (main PENDING PRIO 0)         __set_BASEPRI (basePri=0)
    at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1316
  3    Thread 536977640 (idle UNKNOWN PRIO 15)        __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:261
  4    Thread 536974176 (logging PENDING PRIO 14)     __set_BASEPRI (basePri=0)
    at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1316
  5    Thread 536974360 (usbworkq PENDING PRIO 255)   __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  6    Thread 536977824 (sysworkq QUEUED PRIO 255)    __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  7    Thread 536974576 (BT LW WQ PENDING PRIO 10)    __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  8    Thread 536977192 (usbd_workq PENDING PRIO 255) __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  9    Thread 536976856 (rx_q[0] PENDING PRIO 255)    __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  10   Thread 536974792 (BT RX WQ PENDING PRIO 248)   __set_BASEPRI (basePri=0)
    at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1316
  11   Thread 536908344 (mbox_wq #0 QUEUED PRIO 0)    __set_BASEPRI (basePri=0)
    at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1316
* 12   Remote target                                  0x0009c604 in arch_system_halt (reason=4) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:30

The assert message is:  `ASSERTION FAIL [net_buf_simple_headroom(buf) >= len] @ WEST_TOPDIR/zephyr/lib/net_buf/buf_simple.c:341`

The call stack is as follows:

Thread 12 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Remote target]
0x0009c604 in arch_system_halt (reason=4) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:30
30		for (;;) {
(gdb) bt
#0  0x0009c604 in arch_system_halt (reason=4) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:30
#1  0x000653c8 in k_sys_fatal_error_handler (reason=4, esf=0x20035608 <sys_work_q_stack+3344>) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:44
#2  0x000655c2 in z_fatal_error (reason=4, esf=0x20035608 <sys_work_q_stack+3344>) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:119
#3  0x00022ad0 in z_arm_fatal_error (reason=4, esf=0x20035608 <sys_work_q_stack+3344>) at /Users/todi/bloomlife/zephyr-workspace/zephyr/arch/arm/core/fatal.c:86
#4  0x00084580 in z_do_kernel_oops (esf=0x20035608 <sys_work_q_stack+3344>, callee_regs=0x2)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/arch/arm/core/fatal.c:131
#5  0x00023dae in _oops () at /Users/todi/bloomlife/zephyr-workspace/zephyr/arch/arm/core/cortex_m/swap_helper.S:318
#6  <signal handler called>
#7  assert_post_action (file=0xa34e4 "WEST_TOPDIR/zephyr/lib/net_buf/buf_simple.c", line=341) at /Users/todi/bloomlife/zephyr-workspace/zephyr/lib/os/assert.c:44
#8  0x0002574c in net_buf_simple_push (buf=0x20036904 <drivers::details::_net_buf_mqtt_sn_messages_over_l2cap+12>, len=4)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/lib/net_buf/buf_simple.c:341
#9  0x00088fa8 in net_buf_push (buf=0x200368f8 <drivers::details::_net_buf_mqtt_sn_messages_over_l2cap>, len=4)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/include/zephyr/net_buf.h:2058
#10 0x0003355a in l2cap_data_pull (conn=0x200197d8 <acl_conns>, amount=4294967295, length=0x2003577c <sys_work_q_stack+3716>)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/subsys/bluetooth/host/l2cap.c:946
#11 0x00030790 in bt_conn_tx_processor () at /Users/todi/bloomlife/zephyr-workspace/zephyr/subsys/bluetooth/host/conn.c:1062
#12 0x0002c448 in tx_processor (item=0x20002a10 <tx_work>) at /Users/todi/bloomlife/zephyr-workspace/zephyr/subsys/bluetooth/host/hci_core.c:4743
#13 0x00067d14 in work_queue_main (workq_ptr=0x2001a1a0 <k_sys_work_q>, p2=0x0, p3=0x0) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/work.c:688
#14 0x0007e910 in z_thread_entry (entry=0x67b91 <work_queue_main>, p1=0x2001a1a0 <k_sys_work_q>, p2=0x0, p3=0x0)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/lib/os/thread_entry.c:48
#15 0xfffffffe in ?? ()

When inspecting the buffer that caused the assert, it turns out, that `len` and `size` have values that I would expect, while `data` and `__buf` are set to 0!:

#8  0x0002574c in net_buf_simple_push (buf=0x20036904 <drivers::details::_net_buf_mqtt_sn_messages_over_l2cap+12>, len=4)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/lib/net_buf/buf_simple.c:341
341		__ASSERT_NO_MSG(net_buf_simple_headroom(buf) >= len);
(gdb) p/x *buf
$1 = {data = 0x0, len = 0x49a, size = 0x1028, __buf = 0x0}

Could it be, that there is a bug in the l2cap layer for this corner case? I see no way to prevent that on my side. I there a unit test, that would cover this situation?

best regards

Torsten

  • Hi Torsten,

    When inspecting the buffer that caused the assert, it turns out, that `len` and `size` have values that I would expect, while `data` and `__buf` are set to 0!:

    Can you please elaborate on this. The assert triggered means that net_buf_simple_headroom(buf) was less than the len. If you think there should have been enough space in that buffer, then we need to investigate how the buffer got full or why this API gave wrong info. Could be stackoverflow symptoms?

  • Hi  ,

    there was for sure enough headroom for the l2cap protocol, when the l2cap message was added to the queue (using `bt_l2cap_chan_send()`), as I'm able to send thousands of l2cap SDUs without problem.

    The reason, why the calculation of the headroom results in 0 (and thus is small than len) is, because the headroom is defined to be the difference between `data` and `__buf`. As booth are 0 (which I think is most likely the reason for the failure), the difference and thus the headroom is 0.

    The pool for all messages are statically allocated and as `data` and `__buf` are the first and the last member, I would rule out some accidental overwriting. I looked into the buffer library but also did not found any single function, that sets both pointer to zero.

    As you can see from that call stack, the assert is triggered from a function that runs in a BLE stack internal thread. So there is hardly anything, I can do to prevent it.

    thanks and best regards

    Torsten

  • Yes, this does not seem normal, Can you please help me reproduce this so that I can take this to the BLE stack engineers and they can take us seriously about this? Maybe, if I can reproduce this, then I might be able to find out if this is a stack issue or if there is something we can do in the app to mitigate this.

  • Hi  ,

    I added this issue to our ticket system. Currently we are shortly before rolling out our product to test customers and it might take some time, before someone (most likely me) is able to pick that ticket up and to create a reproducer.

    best regards

    Torsten

  • Torsten,

    Got it, please write here when you pick this task again and able to help me to reproduce the issue. Good luck with your product test launch.

Related