Softdevice event handler continuously called with NRF_EVT_RADIO_BLOCKED

Question

I've got another question concerning a timeslot application. 
 In our timeslot application which is based on the nRF Mesh SDK timeslot.c implementation I noticed that if a BLE connection is active with a connection interval of 7.5ms, the softdevice did not grant any timeslots. I noticed that the cause of this was that the requested timeslot length requested at the end of a timeslot was 14000us and therefore too long to fit into the 7.5ms connection interval. This resultd in the timeslot being canceled via invocation of the softdevice event handler with NRF_EVT_RADIO_CANCELED. 
 The handling of this event was a call to sd_radio_request() but using the same parameters that were canceled. This resulted in the softdevice event handler being called repeatedly with NRF_EVT_RADIO_CANCELED until either the BLE connection parameters were changed or the central disconnected. 
 To avert this behaviour I changed the request parameters to the values that were used in case of a NRF_EVT_RADIO_BLOCKED: 
 m_radio_request_earliest.request_type = NRF_RADIO_REQ_TYPE_EARLIEST;
m_radio_request_earliest.params.earliest.hfclk = NRF_RADIO_HFCLK_CFG_NO_GUARANTEE;
m_radio_request_earliest.params.earliest.priority = NRF_RADIO_PRIORITY_NORMAL;
m_radio_request_earliest.params.earliest.length_us = 3800;
m_radio_request_earliest.params.earliest.timeout_us = 15000; 
 This resulted in the timeslot being granted again after NRF_EVT_RADIO_CANCELED occurred. 
 I then realised that after changing the connection interval to 50ms with a slave latency of 2, timeslots get repeatedly blocked for a duration of 5ms (during the connection event) and only then are they being granted again. See the following chart: 
 
 I thought this behaviour looks odd and inefficient. Ideally, I expect the softdevice to not invoke the BLOCKED event, since the timeout - from my understanding - should only occur after 15000us? Invoking sd_radio_request() when handling the BLOCKED event with a length_us of 3800 and a timeout_us of 14000, I definitely don't expect the event handler to be invoked every ~50us! 
 So I tried to play around with the timeout_us parameter of the nrf_radio_request_t structure, but did not find a viable solution, in fact I only made it worse. When increasing the timeout_us e.g. to 50000, I noticed that the described behaviour occurs for a bit and then disappears, but the timeslot is only started after a total downtime of about 50ms! This drastically reduces the availability of our proprietary radio protocol.

Is there a clean way to avert these continuous BLOCKED events while still maximising timeslot duration? 
 Can you elaborate on the difference between the BLOCKED and CANCELED events? 
 What implications does/should changing the timeout of a radio request have? 
 
 By the way, this is all on an nRF52832 running S132 7.2.0.

Hung Bui · Accepted Answer

Hi Michael, 
 I talked to Sigurd, the bug that the central couldn't send the the first connection event is fixed from Softdevice v8. However we haven't released the S132/S140 v8.0 yet. 
 This bug only affect central devices in some corner cases. 
 I believe the draw back is actually the timing needed to start the crystal, not really about that the HFCLK is kept between timeslot and BLE activity (actually it's the opposite, when you use NRF_RADIO_HFCLK_CFG_XTAL_GUARANTEED the clock is kept) 
 What I'm not really certain about is why in your case the softdevice doesn't grant the timeslot when the interval is 7.5ms when in my example it works fine. In my example, the connection event and the timeslot take turn to preempt each other resulting a timeslot in every 15ms, this match with what described in the timeslot priority that when set to HIGH it has the same priority as the connection event. 
 I would suggest you to double check if you have selected the timeslot priority to HIGH in the request after it's blocked.

Hung Bui · Answer

Hi Mike, 
 I think I can explain the reason you get continuous blocking event with such configuration. 
 When you change the timeout to 15ms and when you just enter a connection, the first connection event has more priority than timeslot at HIGH. The first connection event(s) is important because if you miss 6 of them the connection will be terminated. 
 So timeslot will be blocked at least for the period of the first connection event (7.5ms + t_dist ) period, it will hit the 15ms timeout and the request for earliest timeslot will be blocked. Then the code will receive NRF_EVT_RADIO_BLOCKED event, it will then immediately request a new request_next_event_earliest() which will also result in NRF_EVT_RADIO_BLOCKED because the timeout it still inside the period occupied by the first connection event. And this loop continues until the timeout 15ms it out of the period occupied by the first connection event (in my case with connection interval of 7.5ms it took 2ms of continously trying to get the timeslot. 
 To avoid this loop what we usually do is to gradually increase the timeout_us when we receive the repeated blocked events (in crease it by 1ms each for example). This way we can still get the earliest possible timeslot when the higher priority activity passed.

m.wagner · Answer

Hi Hung Bui, 
 Thanks for the explanation. Even though, you did not touch on the impact of NRF_SDH_BLE_GAP_EVENT_LENGTH, I think it helped me understand what actually happens here. 
 Just for clarification: I do observe this behaviour (continuous BLOCKED loop) during each and every connection event - not just during connection establishement. 
 To sum up the issue again: 
 
 My issue was that I discovered this loop of NRF_EVT_RADIO_BLOCKED. 
 This behaviour can be modified by changing the parameters of the request - most notably increasing the timeout leads to fewer or even none of these BLOCKED events. 
 This however will often result in larger "downtime" of the timeslot

In a "normal" scenario where NRF_SDH_BLE_GAP_EVENT_LENGTH is at the default of 6, a rather low timeout value of 15000us will most likely not result in a BLOCKED timeslot or at least not often. The example above illustrates the scenario where it is BLOCKED once, but then the requested timeslot can be scheduled after the Connection event ends...

The above sketch illustrates a scenario with a large GAP event length but still a short timeout. With NRF_SDH_BLE_GAP_EVENT_LENGTH set to a larger value - let's assume the maximum value of 320 as used in my application - the Softdevice does not yet know how long a connection event will be at the time it is scheduled. Therefore it has to assume the maximum length of 400ms. 
 So radio requests with a short timeout will be continuously blocked until either the timeout at the point of the request is long enough to be after the potential maximum event duration or the connection event actually ended and the Softdevice can grant the timeslot immediately. 
 To circumvent this, one may be tempted to increase the timeout. If we increase the timeout to a value > 400000us (greater than the maximum connection event duration), we get the following scenario: 
 
 The timeout is longer, so the softdevice just schedules the next timeslot at the first point after the potential max. duration of the connection event. In our example this would be about 400ms after the connection event started. But the connection event may actually be over in as little as 7.5ms and therefore, we have 392.5ms of unnecessary downtime of the timeslot! 
 Hung Bui Can you confirm that I got this right? 
 
 Hung Bui said: To avoid this loop what we usually do is to gradually increase the timeout_us when we receive the repeated blocked events (in crease it by 1ms each for example). This way we can still get the earliest possible timeslot when the higher priority activity passed. 
 I played around with this, but what essentially happens with a large value for NRF_SDH_BLE_GAP_EVENT_LENGTH is that I increase the timeout and in the best case scenario, there is still a loop of calls to the softdevice event handler and in the worst case scenario, at some point the timeouts were large enough to still get a timeslot scheduled but after an unnecessary delay. 
 So as it stands right now, I think I would prefer the continuous calls to the SD event handler with the BLOCKED signal to having gaps in radio availability in the timeslot. 
 It would be nice if the softdevice could either reschedule a timeslot if it notices that there is a gap of "wasted" time or if there was any other way to avoid this loop (e.g. an event indicating that a connection event passed - at which point it would be possible to manualy request a new timeslot?) 
 Do you have any suggestions? 
 Thanks & best regards, 
 -mike

Hung Bui · Answer

Hi Mike, I assume when you mentioned "large event_length". it's just larger than 6 and still not larger than the connection interval ? As far as I know when event_length is larger than connection interval it will not take effect and will be capped at the connection interval as the max value. 
 When you mentioned "potential max" , did you mean the event_length ? The softdevice should always presume that the connection event will last as long as event_length. Our softdevice scheduler is not designed to be able to re-schedule if the connection event ends earlier than expected (which is the event_length). This requires a more dynamic algorithm for scheduling and complicating thing a little bit. If you want to avoid such down time, I would suggest to use shorter connection interval. To save power consumption of the peripheral and to avoid the timeslot being blocked by short conn interval, you can use slave latency. When the slave skips the connection events due to slave latency, the timeslots can take place in those slots. In your case when you want the timeslot as much as possible, power consumption may not be a big problem. Then you can afford to go with the approach to request continuously, I think that would be a good workaround for the "non-dynamic" scheduler.

Softdevice event handler continuously called with NRF_EVT_RADIO_BLOCKED

Top Replies