Assert fires when disconnecting while data is transmitted via L2CAP

I have a nRF5340 using nrf connect v3.7.99-nc1 that is in a peripheral role. I establish a l2cap connection to the peripheral, to request a larger amount of data. That works quite ok until I close that connection from the receiving side while still some l2cap SDUs are on the flight. An asserts fires from within a thread (id 12) that I did not spawned:

(gdb) info thread
  Id   Target Id                                      Frame 
  2    Thread 536977456 (main PENDING PRIO 0)         __set_BASEPRI (basePri=0)
    at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1316
  3    Thread 536977640 (idle UNKNOWN PRIO 15)        __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:261
  4    Thread 536974176 (logging PENDING PRIO 14)     __set_BASEPRI (basePri=0)
    at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1316
  5    Thread 536974360 (usbworkq PENDING PRIO 255)   __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  6    Thread 536977824 (sysworkq QUEUED PRIO 255)    __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  7    Thread 536974576 (BT LW WQ PENDING PRIO 10)    __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  8    Thread 536977192 (usbd_workq PENDING PRIO 255) __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  9    Thread 536976856 (rx_q[0] PENDING PRIO 255)    __ISB () at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
  10   Thread 536974792 (BT RX WQ PENDING PRIO 248)   __set_BASEPRI (basePri=0)
    at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1316
  11   Thread 536908344 (mbox_wq #0 QUEUED PRIO 0)    __set_BASEPRI (basePri=0)
    at /Users/todi/bloomlife/zephyr-workspace/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1316
* 12   Remote target                                  0x0009c604 in arch_system_halt (reason=4) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:30

The assert message is:  `ASSERTION FAIL [net_buf_simple_headroom(buf) >= len] @ WEST_TOPDIR/zephyr/lib/net_buf/buf_simple.c:341`

The call stack is as follows:

Thread 12 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Remote target]
0x0009c604 in arch_system_halt (reason=4) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:30
30		for (;;) {
(gdb) bt
#0  0x0009c604 in arch_system_halt (reason=4) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:30
#1  0x000653c8 in k_sys_fatal_error_handler (reason=4, esf=0x20035608 <sys_work_q_stack+3344>) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:44
#2  0x000655c2 in z_fatal_error (reason=4, esf=0x20035608 <sys_work_q_stack+3344>) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/fatal.c:119
#3  0x00022ad0 in z_arm_fatal_error (reason=4, esf=0x20035608 <sys_work_q_stack+3344>) at /Users/todi/bloomlife/zephyr-workspace/zephyr/arch/arm/core/fatal.c:86
#4  0x00084580 in z_do_kernel_oops (esf=0x20035608 <sys_work_q_stack+3344>, callee_regs=0x2)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/arch/arm/core/fatal.c:131
#5  0x00023dae in _oops () at /Users/todi/bloomlife/zephyr-workspace/zephyr/arch/arm/core/cortex_m/swap_helper.S:318
#6  <signal handler called>
#7  assert_post_action (file=0xa34e4 "WEST_TOPDIR/zephyr/lib/net_buf/buf_simple.c", line=341) at /Users/todi/bloomlife/zephyr-workspace/zephyr/lib/os/assert.c:44
#8  0x0002574c in net_buf_simple_push (buf=0x20036904 <drivers::details::_net_buf_mqtt_sn_messages_over_l2cap+12>, len=4)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/lib/net_buf/buf_simple.c:341
#9  0x00088fa8 in net_buf_push (buf=0x200368f8 <drivers::details::_net_buf_mqtt_sn_messages_over_l2cap>, len=4)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/include/zephyr/net_buf.h:2058
#10 0x0003355a in l2cap_data_pull (conn=0x200197d8 <acl_conns>, amount=4294967295, length=0x2003577c <sys_work_q_stack+3716>)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/subsys/bluetooth/host/l2cap.c:946
#11 0x00030790 in bt_conn_tx_processor () at /Users/todi/bloomlife/zephyr-workspace/zephyr/subsys/bluetooth/host/conn.c:1062
#12 0x0002c448 in tx_processor (item=0x20002a10 <tx_work>) at /Users/todi/bloomlife/zephyr-workspace/zephyr/subsys/bluetooth/host/hci_core.c:4743
#13 0x00067d14 in work_queue_main (workq_ptr=0x2001a1a0 <k_sys_work_q>, p2=0x0, p3=0x0) at /Users/todi/bloomlife/zephyr-workspace/zephyr/kernel/work.c:688
#14 0x0007e910 in z_thread_entry (entry=0x67b91 <work_queue_main>, p1=0x2001a1a0 <k_sys_work_q>, p2=0x0, p3=0x0)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/lib/os/thread_entry.c:48
#15 0xfffffffe in ?? ()

When inspecting the buffer that caused the assert, it turns out, that `len` and `size` have values that I would expect, while `data` and `__buf` are set to 0!:

#8  0x0002574c in net_buf_simple_push (buf=0x20036904 <drivers::details::_net_buf_mqtt_sn_messages_over_l2cap+12>, len=4)
    at /Users/todi/bloomlife/zephyr-workspace/zephyr/lib/net_buf/buf_simple.c:341
341		__ASSERT_NO_MSG(net_buf_simple_headroom(buf) >= len);
(gdb) p/x *buf
$1 = {data = 0x0, len = 0x49a, size = 0x1028, __buf = 0x0}

Could it be, that there is a bug in the l2cap layer for this corner case? I see no way to prevent that on my side. I there a unit test, that would cover this situation?

best regards

Torsten

Parents Reply Children
  • Hi  ,

    please find the reproducer here: https://github.com/TorstenRobitzki/l2cap_reproducer

    It is bases on one of the official examples (GATT CGM). It accepts a L2CAP connection and once connected, tries to send as much 4KByte large SDU as possible.

    There is a python script, that used a HCI controller on a nrf52 eval board, to connect to the reproducer.

    Once, the HCI controller, is powered off, the connection times out and the L2CAP layer on the reproducer hits the assert.

    Let me know, when you have further questions or need help with the setup.

    best regards

    Torsten

  • This bug-fix might fix the problem:

    --- a/subsys/bluetooth/host/l2cap.c
    +++ b/subsys/bluetooth/host/l2cap.c
    @@ -959,8 +959,9 @@ struct net_buf *l2cap_data_pull(struct bt_conn *conn,
             * static channels don't have the concept of L2CAP segments.
             */
            bool last_seg = lechan->_pdu_remaining == pdu->len;
    +       bool sdu_end = ( last_frag && last_seg ) || conn->state != BT_CONN_CONNECTED;
     
    -       if (last_frag && last_seg) {
    +       if (sdu_end) {
                    LOG_DBG("last frag of last seg, dequeuing %p", pdu);
                    __maybe_unused struct net_buf *b = k_fifo_get(&lechan->tx_queue, K_NO_WAIT);
     
    @@ -968,7 +969,6 @@ struct net_buf *l2cap_data_pull(struct bt_conn *conn,
            }
     
            if (last_frag && L2CAP_LE_CID_IS_DYN(lechan->tx.cid)) {
    -               bool sdu_end = last_frag && last_seg;
     
                    LOG_DBG("adding %s callback", sdu_end ? "`sdu_sent`" : "NULL");
                    /* No user callbacks for SDUs */
    

Related