Crash after dequeuing invalid L2CAP PDU

Hi,

I have to develop a BLE peripheral, that communicates larger amounts of data over L2CAP. Establishing the channel between central and peripheral works. I'm able to receive two SDUs and send two SDUs, until I find the application reproducible in a crashing situation.

In `zephyr/subsystem/bluetooth/l2cap.c`, I added a `manual` breakpoint (the if statement) in `struct net_buf *l2cap_data_pull(...)` after the currently transmitted `pdu` is pulled out of the `tx_queue` of the current connection:

...
	/* Leave the PDU buffer in the queue until we have sent all its
	 * fragments.
	 *
	 * For SDUs we do the same, we keep it in the queue until all the
	 * segments have been sent, adding the PDU headers just-in-time.
	 */
	struct net_buf *pdu = k_fifo_peek_head(&lechan->tx_queue);
if ( (char*)pdu - (char*)0 < 0x1000 )
	__asm__("BKPT");
	/* We don't have anything to send for the current channel. We could
	 * however have something to send on another channel that is attached to
...

The resulting `pdu` is somewhere located in the flash memory (0x4a), which is clearly not a valid value. I know, that there are millions of ways to shoot one self in the foot, in a multithreaded memory unsafe environment. I've created already some guard buffers around my buffer pool, but that does not changed the error situation. In the error case, the `tx_queue` looks "reasonable", while the head looks quite suspicious:

(gdb) p/x lechan->tx_queue
$2 = {_queue = {data_q = {head = 0x200370b0, tail = 0x200370e0}, lock = {thread_cpu = 0x0}, wait_q = {waitq = {{head = 0x200024fc, 
          next = 0x200024fc}, {tail = 0x200024fc, prev = 0x200024fc}}}, poll_events = {{head = 0x20002504, next = 0x20002504}, {tail = 0x20002504, 
        prev = 0x20002504}}}}
(gdb) p/x *(struct net_buf *)0x200370b0
$3 = {node = {next = 0x3414d}, frags = 0x4a, ref = 0x1, flags = 0x0, pool_id = 0x9, user_data_size = 0x0, {{data = 0x2003812d, len = 0xb, 
      size = 0x1014, __buf = 0x20038124}, b = {data = 0x2003812d, len = 0xb, size = 0x1014, __buf = 0x20038124}}, 

Is this code on the transmission side looking ok?

assert( data.size() <= CONFIG_MQTT_SN_LIB_MAX_PAYLOAD_SIZE );

net_buf * const buf = net_buf_alloc_fixed(
    &mqtt_sn_messages_over_l2cap, K_FOREVER);

if ( !buf )
{
    return 0;
}

net_buf_reserve(buf, BT_L2CAP_SDU_CHAN_SEND_RESERVE);
net_buf_add_mem(buf, data.data(), data.size());

const int rc = bt_l2cap_chan_send(l2cap_channel_, buf);

if (rc < 0) {
    net_buf_unref(buf);
    return 0;
}

assert( rc == 0 || rc == -ENOTCONN );

return data.size();

That's how the buffer pool is defined:

    NET_BUF_POOL_FIXED_DEFINE(
        mqtt_sn_messages_over_l2cap,
        CONFIG_MQTT_SN_LIB_MAX_PUBLISH,
        CONFIG_MQTT_SN_LIB_MAX_PAYLOAD_SIZE + mqtt_sn_max_publish_overhead + BT_L2CAP_SDU_CHAN_SEND_RESERVE,
        0, NULL);

best regards 

Torsten

Parents
  • From a debug session: It seems like the l2cap layer is accessing the buffers user data (which is documented). What is not documented, is the required size, when declaring the buffer pool. The obvious choice would be `sizeof struct closure` from `conn_internal.h`. As this header is not included in the public include path, I was searching for the documented buffer head room: BT_L2CAP_BUF_SIZE. From that search result I extrapolated, that the public available required size could be `CONFIG_BT_CONN_TX_USER_DATA_SIZE` (which is twice the size as `sizeof struct closure`, but who cares).

    Adding `CONFIG_BT_CONN_TX_USER_DATA_SIZE` as "user"-data size to the definition of the buffer pool, fixes the issue.

Reply
  • From a debug session: It seems like the l2cap layer is accessing the buffers user data (which is documented). What is not documented, is the required size, when declaring the buffer pool. The obvious choice would be `sizeof struct closure` from `conn_internal.h`. As this header is not included in the public include path, I was searching for the documented buffer head room: BT_L2CAP_BUF_SIZE. From that search result I extrapolated, that the public available required size could be `CONFIG_BT_CONN_TX_USER_DATA_SIZE` (which is twice the size as `sizeof struct closure`, but who cares).

    Adding `CONFIG_BT_CONN_TX_USER_DATA_SIZE` as "user"-data size to the definition of the buffer pool, fixes the issue.

Children
No Data
Related