This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Softdevice kills BLE connection events in first timeslot after a BLE connection establishment

We observed a weird, seemingly erroneous behaviour of the Softdevice when using the timeslot API. This concerns a product that we developed using Softdevice v4.0.5. The product implements a BLE peripheral which is also acting as an Enhanced ShockBurst PRX in the timeslots granted by the softdevice.

Summary of observed behaviour from the outside
About one in ten times when trying to connect the peripheral, the central is unable to connect while the peripheral still thinks it is connected. On the peripheral side, no connection events occur, but a connection timeout only happens after around two minutes.

Behaviour from central perspective
From the central perspective, the connection establishment fails somewhere. After this fails, the peripheral is not visible (i.e. not advertising at all) anymore.


Detailed behaviour of peripheral
I explain the steps that the peripheral goes through as logged through Segger RTT:

1. The softdevice dispatches BLE_GAP_EVT_CONNECTED.
2. The firmware tries to restart advertising as non-connectable. The call to sd_ble_gap_adv_start succeeds.
3. After 5 seconds we try to renegotiate connection parameters. Again, the call to sd_ble_gap_conn_param_update succeeds.
4. Within another 3 seconds, no response to the connection parameter update request was received.
5. After another second we retry negotiation. This time the call to sd_ble_gap_conn_param_update fails with NRF_ERROR_BUSY, probably because the Softdevice thinks it is still negotiating.
6. 20 seconds after BLE_GAP_EVT_CONNECTED, the peripheral successfully calls sd_ble_gap_disconnect.
7. Only after a long time, usually around 2 minutes after BLE_GAP_EVT_CONNECTED was signalled, will BLE_GAP_EVT_DISCONNECTED be signalled. The given disconnection reason is BLE_HCI_CONNECTION_TIMEOUT.

Protocol Analyzer
The BLE protocol analyzer showed that the connection was successfully established and the central tried to read out the primary service, but no response was received. After that, no communication occurs until after around two minutes. At that point  BLE_GAP_EVT_DISCONNECTED occurs and the devices advertisings are visible again. (see screencap below)




Timeslot usage
As mentioned before, the peripheral uses the timeslot api to communicate using Enhanced ShockBurst. For the record, no ESB communication occurs during these tests.

Logic Analyzer traces
To see what happens, I looked at it with a logic analyzer. I traced the following events using GPIOS (from top down, as shown in the screenshots):

  • "in timeslot" is high while a timeslot is active
  • "BLE connected" is set high upen BLE_GAP_EVT_CONNECTED and to low upen BLE_GAP_EVT_DISCONNECTED
  • "sd_signal_handler" is set high when entering the signal handler supplied to sd_radio_session_open and set low before exiting it.
  • "START" is NRF_RADIO_CALLBACK_SIGNAL_TYPE_START
  • "RADIO" is NRF_RADIO_CALLBACK_SIGNAL_TYPE_RADIO
  • "TIMER0" is NRF_RADIO_CALLBACK_SIGNAL_TYPE_TIMER0
  • "EXTEND_SUC..." is NRF_RADIO_CALLBACK_SIGNAL_TYPE_EXTEND_SUCCEEDED
  • "EXTEND_FAIL..." is NRF_RADIO_CALLBACK_SIGNAL_TYPE_EXTEND_FAILED

The following trace shows a session where I cyclically connected and disconnected the peripheral. At some point at around 45s the described behaviour occurs. As you can see, the timeslot stretches on, uninterrupted for almost 25s. Usually this is even around 2 Minutes (which would about correspond to NRF_RADIO_DISTANCE_MAX_US).



The next screencap shows the "regular" and "correct" behaviour upon connection establishment in more detail:


The next screencap shows the erroneous behaviour:





As you can see, the softdevice seems to be unfazed by the newly established connection and starts to happily grant an "infinitely long" timeslot to my application when this weird behaviour occurs. On the other hand, when the connection can be successfully established, it understandably only grants very short timeslots while the connection interval is still low.

No softdevice assertion occurs at any time.

Do you have an explanation for this behaviour?

  • Hi,

    The BLE protocol analyzer showed that the connection was successfully established and the central tried to read out the primary service, but no response was received.

    Yes, there seems to be a lot of packet retransmission. The arrow for the retransmission packets is pointing from "Advertiser" to "Scanner". Is it the peripheral that is sending these retransmissions?(i.e. the central is not responding?)

    As mentioned before, the peripheral uses the timeslot api to communicate using Enhanced ShockBurst. For the record, no ESB communication occurs during these tests.

     Do you see the issue if you don't request the timeslots ?

    Do you see the issue if you don't restart advertising as non-connectable ?

    "in timeslot" is high while a timeslot is active

     How are you keeping track of this? You set it low when you get the NRF_EVT_RADIO_CANCELED event ? and high when you get the NRF_RADIO_CALLBACK_SIGNAL_TYPE_START event ?

    starts to happily grant an "infinitely long" timeslot to my application when this weird behaviour occurs.

    Are you extending the timeslot with NRF_RADIO_SIGNAL_CALLBACK_ACTION_EXTEND? 

    How long timeslot are you requesting for the first timeslot ?

    About one in ten times when trying to connect the peripheral, the central is unable to connect while the peripheral still thinks it is connected. On the peripheral side, no connection events occur
    On the peripheral side, no connection events occur,
    1. The softdevice dispatches BLE_GAP_EVT_CONNECTED.

     When do you get the connection event, and when do you not get the connection event?

  • Hi Sigurd,

    Thanks for your response.

    The BLE protocol analyzer showed that the connection was successfully established and the central tried to read out the primary service, but no response was received.

    To be honest, I'm not sure what to make of the directions of the arrows in the sequence charts - and low level BLE-debugging is quite hard to me as I usually only have to work with the upper layers of the protocol... Anyways, I think the arrow directions are confusing and I'm not sure the analyzer gets them right. When the connection establishment works properly, the direction of the arrows for the same ATT_Read by group type request is reversed, but I think this may have to do with the analyzer not properly displaying it when the response is missing.

    At least, it does not make sense, that the peripheral device would look for the primary service in the central, I think. So I'm pretty sure that the retries are from the central side to the peripheral side, but I'm currently not entirely sure how to proof that.

     Do you see the issue if you don't request the timeslots ?

    No, everything works as expected when no timeslot sessions are active.

    Do you see the issue if you don't restart advertising as non-connectable ?
    How are you keeping track of this? You set it low when you get the NRF_EVT_RADIO_CANCELED event ? and high when you get the NRF_RADIO_CALLBACK_SIGNAL_TYPE_START event ?

    It's actually representing when the application sees it has a timeslot. I set it high when I receive NRF_RADIO_CALLBACK_SIGNAL_TYPE_START. At that point I configure timer 0 with two CC events:

    1. The first will trigger 2ms before the timeslot should end. When this one fires, I try to extend the timeslot by 5ms by returning NRF_RADIO_SIGNAL_CALLBACK_ACTION_EXTEND.
    2. The other CC event will trigger around 600us before the timeslot expires and will return NRF_RADIO_SIGNAL_CALLBACK_ACTION_REQUEST_AND_END and request a 5ms timeslot as the next timeslot.
    Are you extending the timeslot with NRF_RADIO_SIGNAL_CALLBACK_ACTION_EXTEND? 

    Yes.

    How long timeslot are you requesting for the first timeslot ?

    5 milliseconds.

    As soon as the timeslot starts I try to request an extension of another 5ms by returning NRF_RADIO_SIGNAL_CALLBACK_ACTION_EXTEND. If this succeeds, I update the timer CCs to the new extended durations and return NRF_RADIO_SIGNAL_CALLBACK_ACTION_NONE. If it fails I retry by requesting half the time of the previous request until it is below 1ms.

     When do you get the connection event, and when do you not get the connection event?

    Sorry, I was being a bit unclear here. By "no connection events occur", I was referring to the periodic BLE connection events in which data may be transferred. The peripheral will get the BLE_GAP_EVT_CONNECTED in any case, but nothing happens after that, i.e. after the initial failed attempt to read the primary service handles by the central, no further BLE connection events are observed (via protocol analyzer) for up to around two minutes.

    Regards,

    -mike

  • Hi Sigurd,

    I just did a few additional tests where I delayed the restart of the non-connectable advertising. The issue seems to be definitiely related to this. When restarting non-connectable advertising, the central side will lose the connection. The protocol analyzer shows that the peripheral will stop responding at the time the non-connectable advertising should be started. The timeslots were normally granted up until this point, but as soon as the non-connectable advertising starts, the described behaviour kicks in and a very long timeslot is being granted. See the following trace. The delay is at one second and markers A (BLE_GAP_EVT_CONNECTED) and B (start of the very long timeslot) are 1.0015s apart.

    EDIT: Sorry, I could not upload the screenshot for some reason. I hope the description is enough.

  • Interesting. So it did not help to delay restarting the non-connectable advertising? Delaying the restart of non-connectable maybe ~8 sec, does not help?

  • I did another round of tests:

    • A second, extended test where the non-connectable advertising during the established connection is disabled. The odd behaviour did not occur once in about 70 connect / disconnect cycles.
    • Another test with your suggested 8s delay. This seems to change the odd behaviour with the long timeslot to happen *after* BLE_GAP_EVT_DISCONNECTED, when the regular connectable advertising should have been restarted.
    • An additional test with an additional 8s delay, before the connectable advertising is started after BLE_GAP_EVT_DISCONNECTED seemed to make the issues go away, but I'm not entirely convinced. I'd have to invest some more into the testing set up to be sure.
    • A test in which I reduced the delay of the *connectable* advertising restart after BLE_GAP_EVT_DISCONNECTED from 8s to 4s resulted in at least one case, where the original behaviour was observable, i.e. the timeslot gets "infinitely" extended when the [8s delayed] non-connectable adv. is started.

    I have to re-evaluate if disabling the ADV_NONCONN_IND-advertising is a viable option, so this could be a possible workaround for the issue, but still... something somewhere is going very wrong here.

Related