This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

sd_ble_gap_disconnect returns success but BLE_GAP_EVT_DISCONNECTED event doesn't happen until a while later. A conn_params_error happens during this period.

I'm using S112 and SDK 14.2 to target NRF52810, but using the PCA10040 DK for now. 

I've been coming back this issue multiple times over the past couple of months, still unable to find any hints online. This intermittent issue seems to happen maybe 1 in 10-15 times when disconnecting from my Android device. My firmware needs to handle all kinds of broken Android Bluetooth implementations on all devices (and iOS) so I'm not worrying about device specifics at the moment, just how to deal with weird situations like this in the firmware. 

Normal procedure:

  1. App searching for NRF device
  2. Press button on DK to start advertising
  3. Connection
  4. Read/write some characteristics
  5. sd_ble_gap_disconnect() returns 0
  6. BLE_GAP_EVT_DISCONNECTED event
  7. All ok

When it breaks: 

  1. App searching for NRF device
  2. Press button on DK to start advertising
  3. Connection
  4. Read/write some characteristics
  5. sd_ble_gap_disconnect() returns 0
  6. Seems to hang around still connected (using too much power) for 2-5 seconds
  7. conn_params_error 8 (NRF_ERROR_INVALID_STATE)
  8. Hangs around for another ~10 seconds using power
  9. Finally BLE_GAP_EVT_DISCONNECTED

I know the Bluetooth implementation on some Android devices is awful and my guess was that the phone is hanging onto the connection in the background (this happens too often), but if I turn off the phone's Bluetooth (reliably kills background connections) during step 8, it still takes many seconds before we hit step 9.

It seems the softdevice is failing to disconnect for some reason, the connection params update thinks the device is in an invalid state (probably looks like it is disconnected) but it hasn't really finished disconnecting. Any ideas what could be causing this to happen? 

Thanks in advance. 

Parents
  • Hi, the likely problem here is that if the ACK on the disconnect packet is lost (only 1 packet), then the softdevice will wait until supervisor timeout occurs for that specific connection until the disconnect event occurs. This is according to BLE specifications. There is not much that can be done here, other than possible change the connection supervisor timeout shortly before calling sd_ble_gap_disconnect() to reduce the time.

  • Another update on this: the connection params error happens 7 seconds after connection, this is equal to my FIRST_CONN_PARAMS_UPDATE_DELAY. The BLE_GAP_EVT_DISCONNECTED event happens around 21 seconds after connection, or 18 seconds after sd_ble_gap_disconnect() is called. CONN_SUP_TIMEOUT is 2 seconds (MSEC_TO_UNITS(2000, UNIT_10_MS)), so now I'm not sure it is the connection supervisor forcing the disconnect to finish... Something different is going on here which is taking much longer than the supervisor would. 

  • There may be some sort of race condition here, to easiest may just be to add some global flag to ensure that you don't call any connection parameter update after sd_ble_gap_disconnect()?

Reply Children
  • That would help stop that error event, yes. The real issue is this whole 18 seconds of high power between attempting to disconnect, and the actual event; this time is much longer than the supervisor timeout. I've just had a look using the power profiler and the current use over that long period starts with 1ms~ pulses slowly increasing linearly up to about 15ms long pulses of full CPU operation; it's a very strange feature, definitely looking more like a bug in the softdevice now I've seen this. Attached animated GIF of the power consumption (DCDC off) from sd_ble_gap_disconnect() to the event. 

    I would assume that a disconnect forced by the supervisor timeout doesn't expect a response from the connected device, but it seems that a user initiated disconnect using sd_ble_gap_disconnect() is able to get stuck if the other device happens to fail during that process. Note that during this period, other functions are still working normally, a pin interrupt followed by SPI transfer are able to work during this weird high power consumption period. 

  • Hi, I expect what you are seeing here is the effect of window widening, as the peripheral device is waiting for the ACK it will compensate for the drift between the peer clock and local clock. Over time this will mean that the RX window is widening. For instance if both peer and local clock is 250ppm tolerance, this would mean the window would widen with about 1ms each 1second since last received packet.

    What is the supervisor timeout here? You should be able to get it from the connection event and later connection parameter updates from the peer device.

  • Ah I see, that makes perfect sense! Thanks. The supervisor timeout is configured to 2 seconds on startup, I'm not sure where else it could be being changed. BLE_GAP_EVT_CONNECTED or BLE_GAP_EVT_CONN_PARAM_UPDATE events don't have anything related to timeout change and ble_conn_params_evt_t is just success or fail, so I'm not sure what you're referring to there? 

    Is it possible for you to check if the supervisor timeout is indeed supposed to be still in operation after sd_ble_gap_disconnect() is called? (S112 5.1.0) I'm interested to know where this 18s~ timeout is coming from; there is no BLE_GAP_EVT_TIMEOUT when BLE_GAP_EVT_DISCONNECTED finally happens, so it seems like some weird internal event that is finally causing it to stop. If the RX window becomes so wide that it approaches the size of the connection interval, maybe then it would give up? 

  • For both BLE_GAP_EVT_CONNECTED and BLE_GAP_EVT_CONN_PARAM_UPDATE you should be able to check the actual supervisor timeout in:

    uint16_t m_conn_sup_timeout =
    p_ble_evt->evt.gap_evt.params.connected.conn_params.conn_sup_timeout;         
    /**< Connection Supervision Timeout in 10 ms units, see @ref BLE_GAP_CP_LIMITS.*/

  • Ah it was just buried deep in the event. On connection it is saying 2000, not equal to the MSEC_TO_UNITS(2000, UNIT_10_MS) I configured - even if I manually set it to "200" or "100"! The config sd_ble_gap_ppcp_set() is running successfully on startup.

    On param update event it then becomes 0, which I assume means "unchanged", or is this wrong? 

Related