[SDC/Zephyr] GATT notify TX path wedges (no HCI 0x13 credits) → bt_l2cap_create_pdu_timeout() blocks. Any workaround to recover from deadlock without disconnect? (NCS v3.0.2)

On a BLE Peripheral NRF52840 built with nRF Connect SDK v3.0.2, high-rate GATT notifications sometimes push the stack into a state where no (HCI 0x13) events arrive for the active connection. After that, bt_gatt_notify_cb() callbacks stop, the app’s in-flight window never drains, the queue backs up, and eventually bt_l2cap_create_pdu_timeout() blocks in net_buf_alloc(). Stopping scanning, disabling vendor events, or waiting does not recover it. Only a reset (or disconnect) clears it.

I am aware of the known issues listed here https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/releases_and_maturity/known_issues.html.  So what I am looking for is a safe workaround to reset the BLE stack without tearing down the user connection, ideally something host-side (e.g., controller/host re-init) that doesn’t require a full MCU reset.  I am getting ready to release into my production, but I cannot get around this deadlock.  And the most annoying thing is it can take several hours to reproduce.  So far the only way I have to recover is to either let the WDT reset or I force a reset in software.  

Symptom summary:

Kernel is blocked inside bt_l2cap_create_pdu_timeout()net_buf_alloc() waiting on a TX buffer. The controller still sends other HCI events so I know the SDC is still alive.

Parents Reply Children
No Data
Related