On a BLE Peripheral NRF52840 built with nRF Connect SDK v3.0.2, high-rate GATT notifications sometimes push the stack into a state where no (HCI 0x13) events arrive for the active connection. After that, bt_gatt_notify_cb() callbacks stop, the app’s in-flight window never drains, the queue backs up, and eventually bt_l2cap_create_pdu_timeout() blocks in net_buf_alloc(). Stopping scanning, disabling vendor events, or waiting does not recover it. Only a reset (or disconnect) clears it.
I am aware of the known issues listed here https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/releases_and_maturity/known_issues.html. So what I am looking for is a safe workaround to reset the BLE stack without tearing down the user connection, ideally something host-side (e.g., controller/host re-init) that doesn’t require a full MCU reset. I am getting ready to release into my production, but I cannot get around this deadlock. And the most annoying thing is it can take several hours to reproduce. So far the only way I have to recover is to either let the WDT reset or I force a reset in software.
Symptom summary:
Kernel is blocked inside bt_l2cap_create_pdu_timeout() → net_buf_alloc() waiting on a TX buffer. The controller still sends other HCI events so I know the SDC is still alive.