nRF5340 Missing Disconnect Event (Supervision Timeout)

Hi,

I have a nRF5340 using hci_ipc that has the following features:

  • Central Role + Scanning (Multi-connection to nRF52840 peripherals)
  • Peripheral Role
  • QoS Channel Scanning
  • Nordic Uart (Server + Client)

I am missing bluetooth supervision timeout events with the nRF53 with both ncs 2.7.0 and the upstream Zephyr BLE stack. (No disconnected callback using the Softdevice).

Problem Flow from Central perspective:
1) Scan and connect
2) Subscribe to NUS (cached handles - no discovery)
3) MTU Exchange
4) Send NUS data back and forth (~1.2Kb DTLS handshake)
5) Disconnect Peripheral with pending Central NUS GATT write (breakpoint peripheral nrf52 / power off / reset)
6) Observe no supervision timeout (4s) [This event expected, but doesn't happen]
7) Observe GATT write error 30s later
8) GATT error cleans up nus subscription, manually issue disconnect here
9) Observe no .disconnected callback

Setups used:

1) Initially we were on NCS 2.7.0 with the included hci_ipc using the ipc_radio sample with a few upstream network (not bluetooth) related cherrypicks. We had problems with this configuration missing disconnect events.
2) Now we're on a heavily cherrypicked SDK to get as close to Zephyr main upstream Bluetooth stack (from the last couple days) & the newest nrfx-lib softdevice+hci_ipc. We still had problems with this.
3) With the heavily cherrypicked SDK, we enabled SPLIT_SW_LL and do not miss these supervision timeout events with the same application code.


I am curious what layer these disconnect events are supposed to get propagated through `zephyr/drivers/bluetooth/ipc/ipc.c` when using `hci_ipc`.

I have not yet been able to make a minimum viable example, but am working towards this.

I made a Zephyr discord post regarding this on #nordic for additional context, which I tried to summarize here.
https://discord.com/channels/720317445772017664/883445320812466209/1283513590606860318

Let me know if I can provide additional information,
Jeff

Parents
  • Hi Jeff,

    Thank you for the detailed summary of your setup. This helped me create minimal peripheral and central projects based on our peripheral and central UART, allowing me to reproduce the issue within a couple of minutes (GATT timeout followed by a missed disconnect event). I have handed over the projects to our Bluetooth team for further investigation. Regarding the Zephyr controller, it uses two RX threads with one high priority thread, which may unblock the host. I will keep you updated on the progress.

    Thank you,

    Vidar

  • Vidar,

    That's great news!
    Is it possible to share the sample projects? (No need to spend time cleaning up code) Slight smile

    We're more than happy to test and debug any potential solutions as this is our biggest blocker for customer sampling.

    Very much appreciate the help,
    --Jeff

  • Hi Jeff,

    Sorry for the delayed response. Both R&D and I have made several attempts to reproduce the issue with the missed disconnect, but without any luck. Were you able to reproduce this fairly consistently on your side?

    Below is the central log from the nRF5340 after having reset the 52840 peripheral. Since the scanner is always active in this sample and the peripheral starts advertising immediately after a reset before the supervision timer on the central has expired, I do see repeated warnings showing that the peripheral is already connected before the link is actually terminated by the supervision timeout.

    --Vidar

  • Hey Vidar,

    Thanks for following up.
    My colleague was just able to get the condition in 3 attempts. 

    We're more successful by switching the peripheral nrf52840dk power switch to OFF and waiting for the supervision timeout, rather than hitting the reset button.

    Also make sure you wait until the second raise TX IRQ after connection parameters are exchanged. :)
    --Jeff

  • Hi Jeff,

    Thank you. I’m able to reproduce the issue now. It seems the developers were powering the kit on and off but likely missed the critical window you described. They will test again following your instructions.

    --Vidar

  • Vidar,

    Congrats to the team on getting v2.9.0 across the finish line for nRF54, cool stuff. :)

    Glad you were able to reproduce it -- has there been any more news on this from the devs?

    We're back on the SoftDevice Controller after a couple upstream sagas, so this problem is again our biggest blocker for production.

    P.S. Happy Holidays!
    --Jeff

  • Jeff,

    Thank you! This case has been on my mind lately, but I haven't had any updates to share as the team has not had a chance to revisit the issue yet. I understand this is a blocker for you and I think it is very unfortunate that we still have not been able to provide a solution. Since we will have reduced staffing during christmas due to public holidays, I’m afraid it is not realistic that we will come to a conclusion before new year. But one of the developers said they would have another look at this tomorrow so I will let you know if there are any new findings.

    --Vidar

Reply
  • Jeff,

    Thank you! This case has been on my mind lately, but I haven't had any updates to share as the team has not had a chance to revisit the issue yet. I understand this is a blocker for you and I think it is very unfortunate that we still have not been able to provide a solution. Since we will have reduced staffing during christmas due to public holidays, I’m afraid it is not realistic that we will come to a conclusion before new year. But one of the developers said they would have another look at this tomorrow so I will let you know if there are any new findings.

    --Vidar

Children
No Data
Related