nRF54L15-DK SW-LL peripheral: link becomes unresponsive after MTU exchange when host is a Windows 10/11 BLE central

Question

Hi, 
 
 I'm running into what looks like a software link-layer issue on the nRF54L15-DK that I'd like a second pair of eyes on. The same firmware works end-to-end with iOS / Android centrals but consistently fails post-MTU with Windows 10/11 centrals. 
 
 ### Hardware / SW 
 
 * **Board:** nRF54L15-DK (PCA10156), Engineering A 
 * **SoC HCI report:** HW Variant `nRF54Lx (0x0005)` , Firmware `Standard Bluetooth controller (0x00) v4.2 Build 1` , HCI `5.4 (0x0d)` , manufacturer `0x05f1` 
 * **Zephyr:** v4.2.1 (PlatformIO `framework-zephyr@3.40201.251021` ) 
 * **Controller:** `CONFIG_BT_LL_SW_SPLIT=y` (default for nRF54L) 
 * **Host configuration (relevant subset):** 
 * `BT_PERIPHERAL=y` , `BT_MAX_CONN=1` , `BT_MAX_PAIRED=4` 
 * `BT_SMP=y` , `BT_SMP_ENFORCE_MITM=y` , `BT_FIXED_PASSKEY=y` , `BT_SMP_SC_PAIR_ONLY=n` 
 * GATT characteristics use `BT_GATT_PERM_*_AUTHEN` 
 * `BT_GATT_SERVICE_CHANGED=n` , `BT_GATT_CACHING=n` (turning these on breaks iOS interop) 
 * `BT_L2CAP_TX_MTU=247` , `BT_BUF_ACL_RX_SIZE=251` 
 * `BT_PHY_UPDATE=n` , `BT_AUTO_PHY_UPDATE=n` , `BT_DATA_LEN_UPDATE=n` , `BT_CTLR_PHY_2M=n` 
 * `BT_GAP_AUTO_UPDATE_CONN_PARAMS=n` , `BT_CTLR_CONN_PARAM_REQ=y` 
 * `BT_LONG_WQ_PRIO=0` , `_STACK_SIZE=4096` (so SC P256 keygen completes) 
 * `BT_CONN_TX_NOTIFY_WQ=y` , `_STACK_SIZE=2048` 
 * Local patch: `subsys/bluetooth/host/hci_core.c::bt_tx_irq_raise()` submits `tx_work` to `bt_workq` instead of `sys_work_q` . Equivalent to upstream PR #97913 ( `CONFIG_BT_TX_PROCESSOR_THREAD` , in v4.4.0). Required to make iOS work at all on this board; without it `tx_processor` doesn't run during an active connection and `ATT_EXCHANGE_MTU_RSP` never reaches air. 
 
 ### Symptom 
 
 1. The host (Windows 10 22H2 or 11 23H2) is paired via Settings → Bluetooth → Add Device using the fixed passkey. The bond persists; subsequent reconnects pull the LTK from `BT_SETTINGS` storage and SMP completes via encryption-with-stored-keys, not full pairing. 
 2. From Python using [ bleak ]( https://github.com/hbldh/bleak ), `BleakClient(...)` is awaited; bleak reports the connection as established. The host then issues `ATT_EXCHANGE_MTU_REQ` with Client MTU 527. 
 3. The peripheral replies with Server MTU 247. The host's request and our response are both visible in the RTT log (see below). 
 4. After our response is acknowledged by the controller ( `att_on_sent_cb` fires), the host issues no further L2CAP traffic — no `ATT_READ_BY_GROUP_TYPE_REQ` , no `ATT_FIND_BY_TYPE_VALUE_REQ` , no notification CCC writes. 
 5. There is no `disconnected_cb` . No HCI `LE_Disconnection_Complete` is delivered. From the device's perspective the link is still alive. 
 6. An application-level supervision watchdog (60 s with no ATT activity) eventually force-disconnects. Even `bt_conn_disconnect(conn, BT_HCI_ERR_REMOTE_USER_TERM_CONN)` from that path does not produce a `disconnected_cb` synchronously and the device has to `sys_reboot()` to recover advertising. 
 
 ### What is materially different about iOS vs Windows here 
 
 In the same RTT log I can see two MTU exchanges back-to-back: 
 
 * **iOS** (works): connect → **MTU exchange** → SMP pairing → encryption → GATT discovery 
 * **Windows** (fails): connect → **encryption from stored bond** → **MTU exchange** → silence 
 
 Both produce the identical `ATT_EXCHANGE_MTU_RSP` PDU ( `03 f7 00` ). The only obvious difference at the air layer is that Windows performs MTU exchange **after** encryption, while iOS does it before. 
 
 ### Log excerpt (relevant 250 ms window) 
 
 ```text 
 [00:21:11.008] bt_l2cap: l2cap_accept: conn 0x20003720 handle 0 
 [00:21:11.009] bt_smp: bt_smp_accept: conn 0x20003720 handle 0 
 [00:21:11.010] bt_att: bt_att_accept: conn 0x20003720 handle 0 
 [00:21:11.364] bt_smp: bt_smp_encrypt_change: encrypt 0x01 hci status 0x00 
 [00:21:11.364] bt_att: bt_att_encrypt_change: sec_level 0x04 status 0x00 
 [00:21:11.365] bt_gatt: bt_gatt_encrypt_change: conn 0x20003720 
 [00:21:11.454] bt_l2cap: bt_l2cap_recv: Packet for CID 4 len 3 
 [00:21:11.455] bt_att: att_mtu_req: Client MTU 527 
 [00:21:11.455] bt_att: att_mtu_req: Server MTU 247 
 [00:21:11.455] bt_att: chan_send: code 0x03 
 [00:21:11.456] bt_att: att_mtu_req: Negotiated MTU 247 
 [00:21:11.456] bt_l2cap: PDU payload 03 f7 00 
 [00:21:11.458] bt_att: att_on_sent_cb: opcode 0x3 
 [00:21:11.458] bt_att: bt_att_sent: chan 0x20014dc0 
 [00:21:11.458] bt_l2cap: l2cap_data_pull: no channel conn 0x20003720 
 ← no further events for 60 s, 
 no disconnected_cb, 
 no LE_Disconnection_Complete 
 [00:22:11.xxx] <app> BLE zombie (hard watchdog 60s); rebooting 
 ``` 
 
 ### Questions for the Nordic team 
 
 1. **Is the post-encrypt MTU-exchange path known to be brittle on the nRF54L15 SW-LL with Windows hosts?** I haven't been able to find this scenario in DevZone or in the Zephyr issue tracker. The closely-related upstream PR #97913 (tx_processor → own thread) lands in Zephyr v4.4.0 but does not appear to address this specific symptom — it covers a different deadlock (sys_work_q starvation) that we already worked around locally. 
 2. **Is there a known SW-LL workaround** — for example, forcing the connection-update / data-length-update / PHY-update sequence in a specific order via Kconfig, or disabling controller privacy / cross-transport-key-derivation features that Windows might invoke after encryption — that produces a `LE_Disconnection_Complete` event when the host abandons the link? 
 3. **Would moving to NCS** ( `sdk-nrf` instead of upstream Zephyr) likely change this behaviour? I see NCS often carries Nordic-specific controller patches ahead of upstream. If there's a relevant `sdk-nrf` revision or override I should try, I'd appreciate a pointer. 
 4. **For diagnosis, do you recommend an on-air capture?** I don't currently have an nRF Sniffer dongle but I plan to acquire one in the coming weeks. If a capture would significantly narrow this down, I'll prioritise. If there's a faster diagnostic path (e.g. enabling specific Kconfig controller logs, or a known good test sample to run against Windows that establishes whether this is the SW-LL or the host), please let me know. 
 
 ### What I have already ruled out 
 
 * iOS / Android path is fine, so it is not application logic and not the GATT DB. 
 * No prepare-write fragmentation involved — Windows hasn't issued any GATT op yet. 
 * `bt_keys_get_addr` returns the persisted bond correctly; encryption succeeds ( `sec_level 0x04` ). 
 * `bt_l2cap: no channel conn 0x...` after the MTU response means the device-side TX queue is drained — the host stack is not blocked on us. 
 * PR #97913 backport applied locally and verified working on the iOS path. 
 
 I'm happy to provide the full 2,500-line RTT log (RTT channel 1, `LOG_BACKEND_RTT_BUFFER_SIZE=24576` , immediate mode), the full prj.conf, and the board overlay if useful. Let me know what would help triage. 
 
 Thanks, 
 Carlos