nRF5340 Missing Disconnect Event (Supervision Timeout)

Hi,

I have a nRF5340 using hci_ipc that has the following features:

  • Central Role + Scanning (Multi-connection to nRF52840 peripherals)
  • Peripheral Role
  • QoS Channel Scanning
  • Nordic Uart (Server + Client)

I am missing bluetooth supervision timeout events with the nRF53 with both ncs 2.7.0 and the upstream Zephyr BLE stack. (No disconnected callback using the Softdevice).

Problem Flow from Central perspective:
1) Scan and connect
2) Subscribe to NUS (cached handles - no discovery)
3) MTU Exchange
4) Send NUS data back and forth (~1.2Kb DTLS handshake)
5) Disconnect Peripheral with pending Central NUS GATT write (breakpoint peripheral nrf52 / power off / reset)
6) Observe no supervision timeout (4s) [This event expected, but doesn't happen]
7) Observe GATT write error 30s later
8) GATT error cleans up nus subscription, manually issue disconnect here
9) Observe no .disconnected callback

Setups used:

1) Initially we were on NCS 2.7.0 with the included hci_ipc using the ipc_radio sample with a few upstream network (not bluetooth) related cherrypicks. We had problems with this configuration missing disconnect events.
2) Now we're on a heavily cherrypicked SDK to get as close to Zephyr main upstream Bluetooth stack (from the last couple days) & the newest nrfx-lib softdevice+hci_ipc. We still had problems with this.
3) With the heavily cherrypicked SDK, we enabled SPLIT_SW_LL and do not miss these supervision timeout events with the same application code.


I am curious what layer these disconnect events are supposed to get propagated through `zephyr/drivers/bluetooth/ipc/ipc.c` when using `hci_ipc`.

I have not yet been able to make a minimum viable example, but am working towards this.

I made a Zephyr discord post regarding this on #nordic for additional context, which I tried to summarize here.
https://discord.com/channels/720317445772017664/883445320812466209/1283513590606860318

Let me know if I can provide additional information,
Jeff

Parents
  • Hi Jeff,

    Thank you for the detailed summary of your setup. This helped me create minimal peripheral and central projects based on our peripheral and central UART, allowing me to reproduce the issue within a couple of minutes (GATT timeout followed by a missed disconnect event). I have handed over the projects to our Bluetooth team for further investigation. Regarding the Zephyr controller, it uses two RX threads with one high priority thread, which may unblock the host. I will keep you updated on the progress.

    Thank you,

    Vidar

  • Vidar,

    That's great news!
    Is it possible to share the sample projects? (No need to spend time cleaning up code) Slight smile

    We're more than happy to test and debug any potential solutions as this is our biggest blocker for customer sampling.

    Very much appreciate the help,
    --Jeff

  • Hi Vidar,

    So, moving to the new Softdevice has been relatively good --

    However, I have come across two cases where the softdevice says there is a connection to a peripheral, where the peripheral has long since been removed, and the other side says the link is disconnected.

    This prevents reconnection, because internally it thinks there is already a connection.


    Our application tracts connected and disconnected events, and we show 0 connections at this point, however it seems conn_idx 0 is stuck in the softdevice as it is trying to use conn_idx 1 for a new connection.

    My understanding is the LL layer should have cleaned this up into the disconnected state, even if our application forgot to unref it?

    Wondering if you had any input!
    -- Jeff

  • Hi Jeff,

    The app is responsible for releasing its reference count on disconnect, or after a failed connection attempt. Otherwise the connection object will not be freed by the host even if the link has been terminated. However, I'm unsure why it reports that it is in the connected state, not the disconnected state. Have you tried testing your app against the v2.8.0-rc1 tag, or did you only update the Softdevice library?

    Best regards,

    Vidar 

  • Vidar,

    We're using a heavily updated ncs v2.7.0 to keep up with upstream, we pulled in the softdevice from v2.8.0-rc1 (and tested the newer recently updated one).

    I double checked my application, and I don't see any holes where a conn will not get unref'd, even with any conn errors --

    We tracked it back down to a missing disconnect event again. The connection stays in the connected state internally.

    In lieu of any alternate suggestions that come up, we're planning to run our tests again when v2.8.0 is out, and we can merge our patches again.

    --Jeff

  • Hi Jeff,

    I'm afraid I don't have any better suggestions than to try using v2.8.0 to see if it fixes the issue. The developers believe the problem should be fixed in this release. v2.8.0 should be tagged by the end of next week if everything goes according to plan.

    Vidar

  • Vidar,

    We will keep tabs on the releases.
    Thanks for the expected timeline, and continued support with the developers. It is greatly appreciated :)

    --Jeff

Reply Children
No Data
Related