This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Softdevice event handler continuously called with NRF_EVT_RADIO_BLOCKED

I've got another question concerning a timeslot application.

In our timeslot application which is based on the nRF Mesh SDK timeslot.c implementation I noticed that if a BLE connection is active with a connection interval of 7.5ms, the softdevice did not grant any timeslots. I noticed that the cause of this was that the requested timeslot length requested at the end of a timeslot was 14000us and therefore too long to fit into the 7.5ms connection interval. This resultd in the timeslot being canceled via invocation of the softdevice event handler with NRF_EVT_RADIO_CANCELED.

The handling of this event was a call to sd_radio_request() but using the same parameters that were canceled. This resulted in the softdevice event handler being called repeatedly with NRF_EVT_RADIO_CANCELED until either the BLE connection parameters were changed or the central disconnected.

To avert this behaviour I changed the request parameters to the values that were used in case of a NRF_EVT_RADIO_BLOCKED:

m_radio_request_earliest.request_type               = NRF_RADIO_REQ_TYPE_EARLIEST;
m_radio_request_earliest.params.earliest.hfclk      = NRF_RADIO_HFCLK_CFG_NO_GUARANTEE;
m_radio_request_earliest.params.earliest.priority   = NRF_RADIO_PRIORITY_NORMAL;
m_radio_request_earliest.params.earliest.length_us  = 3800;
m_radio_request_earliest.params.earliest.timeout_us = 15000;

This resulted in the timeslot being granted again after NRF_EVT_RADIO_CANCELED occurred.

I then realised that after changing the connection interval to 50ms with a slave latency of 2, timeslots get repeatedly blocked for a duration of 5ms (during the connection event) and only then are they being granted again. See the following chart:

I thought this behaviour looks odd and inefficient. Ideally, I expect the softdevice to not invoke the BLOCKED event, since the timeout - from my understanding - should only occur after 15000us? Invoking sd_radio_request() when handling the BLOCKED event with a length_us of 3800 and a timeout_us of 14000, I definitely don't expect the event handler to be invoked every ~50us!

So I tried to play around with the timeout_us parameter of the nrf_radio_request_t structure, but did not find a viable solution, in fact I only made it worse. When increasing the timeout_us e.g. to 50000, I noticed that the described behaviour occurs for a bit and then disappears, but the timeslot is only started after a total downtime of about 50ms! This drastically reduces the availability of our proprietary radio protocol.

  • Is there a clean way to avert these continuous BLOCKED events while still maximising timeslot duration?
  • Can you elaborate on the difference between the BLOCKED and CANCELED events?
  • What implications does/should changing the timeout of a radio request have?

By the way, this is all on an nRF52832 running S132 7.2.0.

Parents
  • Hi Michael, 
    If you have a look at the documentation of the softdevice (SDS) you can find this scheduling properties: 

    According to this if you request the timeslot at NORMAL priority it will have lower priority than the connection activity, meaning that if the timeslot doesn't fit entirely inside a period when there is no connection event (calculated by the event length reserved) it will get cancelled. 

    Please try to test again with the priority set to HIGH. 

    However, I couldn't explain why you get the event every 50us. We may need to look into your code to see why. 

    You can have a look at my example code here

    I attached the logic trace in my test here when the device first advertised, then entered a connection with interval = 7.5ms after 5 seconds the interval changed to 750ms. You can find the timeslot got cancelled multiple times, but still it get some slots in between. 

    1 MHz, 20 M Samples [9].logicdata

  • In the meantime, I was able to reproduce the issue with the ble_app_proximity example from the nRF SDK for Mesh 4.2.0.

    I've attached the diff to the nRF SDK for Mesh 4.2.0 folder. The modifications I made were:

    • Modified nRF5 SDK related paths in project file to point to folder nRF5_SDK at same level as the mesh SDK folder is.
    • Enabled and configured timeslot debug pins (default configuration for timeslot debugging in mesh/core):
      • P0.03: high while in timeslot
      • P0.04: high while in signal handler
      • P0.28: high while trying to extend timeslot
      • P0.29: high when serving TIMER0 IRQ
      • P0.30: high while in softdevice event handler (the culprit here!)
      • P0.31: High when timeslot end is reached
      • P0.24: High if high priority timeslot was requested
      • P0.25: High if extension succeeded.
    • Set APP_TIMER_CONFIG_RTC_FREQUENCY to 0 (only way the example runs)
    • Set NRF_SDH_BLE_GAP_EVENT_LENGTH to 320 --> this triggers the issue!

    timeslot-sd-event-loop.patch

    So I've touched on two issues here:

    1. The weird behaviour (described in the initial post) that occurs during a BLE connection where the SD event handler is invoked repeatedly. This seems to happen in the suppleid exemple when settting NRF_SDH_BLE_GAP_EVENT_LENGTH to 320. We set this to a high value to be able to achieve higher BLE throughput when needed.

      Do you have any explanation as to why this behaviour occurs and how it can be avoided?

    2. When setting the hfclk parameter to NRF_RADIO_HFCLK_CFG_NO_GUARANTEE, no timeslots can be served if the connection interval is only 7.5ms. The idea was to configure the hfclk that way because we are able to have it continuously running, as we do not have any battery constraints. Additionally, there seems to be a connection issue that seems to be avoidable if NRF_RADIO_HFCLK_CFG_NO_GUARANTEE is chosen.

      So why can no timeslots be served with a 7.5ms connection interval and NRF_RADIO_HFCLK_CFG_NO_GUARANTEE even though the HFCLK is running?

    Looking forward to your responses!

    Thank you & best regards,

    -mike

  • Hi Michael, 

    I tried this configuration: 

    #define TS_LEN_US                   (5000UL)                /**< Length of timeslot to be requested. */
    #define TX_LEN_EXTENSION_US         (5000UL)                /**< Length of timeslot to be extended. */
    #define TS_SAFETY_MARGIN_US         (700UL)                 /**< The timeslot activity should be finished with this much to spare. */
    #define TS_EXTEND_MARGIN_US         (2000UL)                /**< Margin reserved for extension processing. */
    
    uint32_t request_next_event_earliest(void)
    {
        m_timeslot_request.request_type                = NRF_RADIO_REQ_TYPE_EARLIEST;
        m_timeslot_request.params.earliest.hfclk       = NRF_RADIO_HFCLK_CFG_NO_GUARANTEE;
        m_timeslot_request.params.earliest.priority    = NRF_RADIO_PRIORITY_HIGH;
        m_timeslot_request.params.earliest.length_us   = TS_LEN_US;
        m_timeslot_request.params.earliest.timeout_us  = 500000;
        return sd_radio_request(&m_timeslot_request);
    }
    
    
    /**@brief Configure next timeslot event in earliest configuration.
     */
    void configure_next_event_earliest(void)
    {
        m_timeslot_request.request_type                = NRF_RADIO_REQ_TYPE_EARLIEST;
        m_timeslot_request.params.earliest.hfclk       = NRF_RADIO_HFCLK_CFG_NO_GUARANTEE;
        m_timeslot_request.params.earliest.priority    = NRF_RADIO_PRIORITY_HIGH;
        m_timeslot_request.params.earliest.length_us   = TS_LEN_US;
        m_timeslot_request.params.earliest.timeout_us  = 500000;
    }
    

    And it doesn't seem to get any problem with 7.5ms. The timeslot is granted every other connection event. 

    You can try my code here. please update with the above configuration. 

    When you use NRF_RADIO_HFCLK_CFG_NO_GUARANTEE, you have the responsibility to disable the crystal before the timeslot end. And the softdevice would need to start it again when it get back to BLE activity. So it may take longer time for the softdevice to ramp up the radio before it can handle the BLE activity. But in my case this anyway doesn't seem to affect the timeslot request. You can find attached trace which i used the NRF_RADIO_HFCLK_CFG_NO_GUARANTEE with 7.5ms interval. 
    10Feb.zip


    Regarding issue #1 if you can find away to reproduce with my example code on github above, I can try to test here. 

  • Hi Hung Bui,

    When you use NRF_RADIO_HFCLK_CFG_NO_GUARANTEE, you have the responsibility to disable the crystal before the timeslot end.

    I'm not sure it would be of much use to try our application with your example, since we use the entire timeslot extension algorithm from Nordic's Mesh stack - which apart from these few open issues I have - works really well and allows us to absolutely maximise timeslot use...

    When you use NRF_RADIO_HFCLK_CFG_NO_GUARANTEE, you have the responsibility to disable the crystal before the timeslot end.

    Why would I need to disable it? Can't I leave it running for the entire run time of my application?


    Regarding issue #1 if you can find away to reproduce with my example code on github above, I can try to test here. 

    Would be great if you could take a look at the mesh example I provided above. It's almost out of the box, the only modifications I did were the required path adaptations and setting NRF_SDH_BLE_GAP_EVENT_LENGTH to 320.

    I'll see if I manage to reproduce it with your example, though.

    Thanks & regards,

    -mike

Reply
  • Hi Hung Bui,

    When you use NRF_RADIO_HFCLK_CFG_NO_GUARANTEE, you have the responsibility to disable the crystal before the timeslot end.

    I'm not sure it would be of much use to try our application with your example, since we use the entire timeslot extension algorithm from Nordic's Mesh stack - which apart from these few open issues I have - works really well and allows us to absolutely maximise timeslot use...

    When you use NRF_RADIO_HFCLK_CFG_NO_GUARANTEE, you have the responsibility to disable the crystal before the timeslot end.

    Why would I need to disable it? Can't I leave it running for the entire run time of my application?


    Regarding issue #1 if you can find away to reproduce with my example code on github above, I can try to test here. 

    Would be great if you could take a look at the mesh example I provided above. It's almost out of the box, the only modifications I did were the required path adaptations and setting NRF_SDH_BLE_GAP_EVENT_LENGTH to 320.

    I'll see if I manage to reproduce it with your example, though.

    Thanks & regards,

    -mike

Children
  • Hi Michael, 

    I quote here what inside the Softdevice spec : 

    I think the softdevice always assume the crystal is disabled when it's returned from the application timeslot. So it's the application responsibility to disable it again before timeslot ends.


    I understood that you have a working example that  only have some corner cases need to be fixed. But if you can reproduce on much simpler application then it's easier to spot the root cause. 

  • Hi Hung Bui,

    I think the softdevice always assume the crystal is disabled when it's returned from the application timeslot. So it's the application responsibility to disable it again before timeslot ends.

    Okay, I think you helped me understand it better. I was thrown off by the Nordic engineers response to the issue I linked above. The suggested workaround there was:

    Based on initial testing, it's seems like using NRF_RADIO_HFCLK_CFG_XTAL_GUARANTEED is causing the first packet sent by the master after CONNECT_IND packet to be sent a bit too early. Other workarounds could be to set NRF_SDH_CLOCK_LF_ACCURACY to 1, or use NRF_RADIO_HFCLK_CFG_NO_GUARANTEE.

    The user then asked whether there are any negative impacts:

    In the time_sync example HFCLK is alway on due to calling sd_clock_hfclk_request(). So if I change the timeslot requests from NRF_RADIO_HFCLK_CFG_XTAL_GUARANTEED to NRF_RADIO_HFCLK_CFG_NO_GUARANTEE are there any drawbacks?

    And the engineers response was:

    Only that the HFCLK is not turned off between BLE radio and timeslot usage, so the current consumption will potentially be a little bit higher.

    So I assumed that it is not actually required that the HFCLK must be disabled before ending the timeslot.

    I could imagine, though, that the root of my issue with the softdevice not granting any timeslots with a 7.5ms connection interval, is that the softdevice's scheduling needs to account for the time it takes for the hfclk to start - no matter if it's already running or not - and therefore does not grant any timeslots in this scenario. Can you confirm this?

    Does the softdevice issue above actually have any impact on a device that only implements a BLE peripheral?

    But if you can reproduce on much simpler application then it's easier to spot the root cause. 

    Of course. I thought it may already be helpful since it's a slight modification to one of the Mesh SDK example.

    I'm still working on reproducing it with your example, but so far without success. It looks like I managed to trigger the behaviour I observed in my other open issue pertaining to timeslots, where it seems the softdevice does not fire the signal handler for TIMER0 interrupts and then fires the event handler with NRF_EVT_RADIO_IDLE even though the timeslot was not ended via signal handler. Will post these findings to the other issue when I'm ready.

  • Hi Michael, 

    I talked to Sigurd, the bug that the central couldn't send the the first connection event is fixed from Softdevice v8. However we haven't released the S132/S140 v8.0 yet. 

    This bug only affect central devices in some corner cases. 

    I believe the draw back is actually the timing needed to start the crystal, not really about that the HFCLK is kept between timeslot and BLE activity (actually it's the opposite, when you use NRF_RADIO_HFCLK_CFG_XTAL_GUARANTEED  the clock is kept)

    What I'm not really certain about is why in your case the softdevice doesn't grant the timeslot when the interval is 7.5ms when in my example it works fine. In my example, the connection event and the timeslot take turn to preempt each other resulting a timeslot in every 15ms, this match with what described in the timeslot priority that when set to HIGH it has the same priority as the connection event.

    I would suggest you to double check if you have selected the timeslot priority to HIGH in the request after  it's blocked. 

  • Hi Hung Bui

    This bug only affect central devices in some corner cases. 
    I believe the draw back is actually the timing needed to start the crystal, not really about that the HFCLK is kept between timeslot and BLE activity (actually it's the opposite, when you use NRF_RADIO_HFCLK_CFG_XTAL_GUARANTEED  the clock is kept)

    Okay, that's helpful. In conclusion, I know now that I'm safe and better off to use NRF_RADIO_HFCLK_CFG_XTAL_GUARANTEED in my application.

    What I'm not really certain about is why in your case the softdevice doesn't grant the timeslot when the interval is 7.5ms when in my example it works fine.

    I think we were talking about different things here. When I originally wrote that setting the priority to HIGH does not change the oserved behaviour, I was referring to the continuous indications of the BLOCKED event. This still occurs when setting the priority to HIGH, but I will manage to get a timeslot within reasonable time.

    But if you can reproduce on much simpler application then it's easier to spot the root cause. 

    I managed to reproduce the behaviour where the signal handler is continuously invoked with the BLOCKED event with your example. It happens as soon as I reduce the timeout_us parameter to e.g. 15000 while also configuring the softdevice with any value for NRF_SDH_BLE_GAP_EVENT_LENGTH greater than 6. The higher the value, the more subsequent BLOCKED events seem to occur.

    Steps to reproduce with your example:

    1. Get the linked example to run as per the instructions.
    2. Attach the patch attached to this comment
    3. Run the example
    4. Connect a BLE central to the peripheral (advertising as "Nordic_UART"). I used the Desktop version of nRF Connect.

    This will result in the BLOCKED event being continuously invoked as soon as a timeslot ends.

    The patch contains the following changes:

    • Added and enabled GPIOs for debugging:
      • P0.03 - Signals whether device is in timeslot
      • P0.04 - High while in radio callback (signal ID is indicated by repeated toggles)
      • P0.28 - Set when callback_action set to NRF_RADIO_SIGNAL_CALLBACK_ACTION_EXTEND, cleared on EXTEND_SUCCEEDED or EXTEND_FAILED
      • P0.29 - High while serving SIGNAL_TYPE_TIMER0
      • P0.30 - High while in softdevice signal handler (signal ID is indicated by repeated toggles)
      • P0.31 - High while serving TIMER0 interrupt where timeslot is ended.
      • P0.25 - Indicates EXTEND_SUCCEEDED.
    • Reduced the timeout_us parameter in the radio request to 15000.
    • Increased NRF_SDH_BLE_GAP_EVENT_LENGTH to 7.
    • Increased RAM available for Softdevice (so NRF_SDH_BLE_GAP_EVENT_LENGTH can be increased up to 320)

    4314.example-uart-esb-timeslot.patch

    It would be great if you can look into this and find a solution to this erroneous behaviour.

    Thanks & best regards,

    -mike

  • Hi Mike,

    I think I can explain the reason you get continuous blocking event with such configuration. 


    When you change the timeout to 15ms and when you just enter a connection, the first connection event has more priority than timeslot at HIGH. The first connection event(s) is important because if you miss 6 of them the connection will be terminated. 

    So timeslot will be blocked at least for the period of the first connection event (7.5ms + t_dist ) period, it will hit the 15ms timeout and the request for earliest timeslot will be blocked. 

    Then the code will receive NRF_EVT_RADIO_BLOCKED event, it will then immediately request a new request_next_event_earliest() which will also result in NRF_EVT_RADIO_BLOCKED because the timeout it still inside the period occupied by the first connection event. And this loop continues until the timeout 15ms it out of the period occupied by the first connection event (in my case with connection interval of 7.5ms it took 2ms of continously trying to get the timeslot. 

    To avoid this loop what we usually do is to gradually increase the timeout_us when we receive the repeated blocked events (in crease it by 1ms each for example). This way we can still get the earliest possible timeslot when the higher priority activity passed.

Related