This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Scheduler get full by NRF_EVT_RADIO_BLOCKED events during connect.

Using:

  • nRF52832
  • SDK 15.3.0
  • SD S132 6.1.1
  • nRF5 SDK for Mesh 3.2.0
  • NRF_SDH_DISPATCH_MODEL == NRF_SDH_DISPATCH_MODEL_APPSH

Hi, I'm often getting a crash when connecting. The back trace showed hints of it happening in the mesh code: timeslot.c handling event NRF_EVT_RADIO_BLOCKED.

But later I found out it actually happens because the app scheduler is full, while nrf_sdh.c tries to add a SD event on the scheduler: ret_code_t ret_code = app_sched_event_put(NULL, 0, appsh_events_poll);

When logging SOC events, I noticed many NRF_EVT_RADIO_BLOCKED events coming in when a connection is made. And when inspecting what happens, I saw that timeslot.c simply keeps on requesting a new timeslot until it succeeded.

Am I right to assume this timeslot request loop can lead to the scheduler getting filled up (especially when other processes also add to the scheduler)? And if so, is there a solution for this?

Parents
  • Hi,

    Am I right to assume this timeslot request loop can lead to the scheduler getting filled up (especially when other processes also add to the scheduler)?

    Yes, I think that can be the case.

    When logging SOC events, I noticed many NRF_EVT_RADIO_BLOCKED events coming in when a connection is made.

    Not sure what you mean by connect? Assuming your application are only running ble mesh, there is no concept of connections in a mesh network.

    Could you provide more information regarding your use-case and what you are trying to do?

    Could it be related to interrupt priority levels?

    Also, is there a reason you are using an old SDK version? If possible I would suggest you move to our latest SDK(nRF5 for Mesh SDK v5.0.0), as many improvements have been made from v3.2.0 to v.5.0.0.

  • Hi,

    This have been sent to our Mesh team to have look at, I will update you once I have a response from them.

Reply Children
  • Thanks, looking forward to it.

    For now I'm using a work around to at least not crash: patch

    When just using that work around, however, we get even more calls to sd_radio_request(), causing the main thread to be blocked until the connect succeeded.

    To avoid that, I now also wait for NRF_MESH_EVT_DISABLED before calling sd_ble_gap_connect(). This prevents most of the NRF_EVT_RADIO_BLOCKED loops.

    Handling NRF_EVT_RADIO_SESSION_CLOSED still takes 4 ms while connecting. And there are still some short NRF_EVT_RADIO_BLOCKED loops, mostly while connected, but they don't block the main thread for seconds, still can take up to 15 ms to handle.

  • Hi,

    From speaking to one of our developer he thinks there are two potential solutions to this:

    1. Reduce TIMESLOT_BASE_LENGTH_SHORT_US so it can fit between two connection events of the shortest type. This is the intent of the current functionality, but some condition on the device (configuration, clock drift setting?) is preventing the device from getting this short timeslot. Currently, it's 3800 us, try changing it to 3000, for instance.

    2. Add some backoff mechanism in the BLOCKED event handler. In this handler, we're currently setting the timeslot request length to TIMESLOT_BASE_LENGTH_SHORT_US before retrying. If we end up here, and the request timeslot length is TIMESLOT_BASE_LENGTH_SHORT_US already, try starting a backoff timer that comes back to request a timeslot later. This could also be implemented as an nrf_mesh_evt_t, but that might still be too tight of a loop.

  • Thanks, I will try the first solution and get back with the results. How is this 3800 us calculated, if i may ask?

    Solution 2 is indeed what I also thought to be a better solution, but I don't dare to implement this, as it seems quite a big change, with possible side effects.

  • bart said:
    Thanks, I will try the first solution and get back with the results. How is this 3800 us calculated, if i may ask?

    Great! This was based on specification from our Softdevice team, it stated that the mesh stack could always get at least 4000us, even in a connection. It seems to have changed though.

  • So I tested with 3000us, and it seems like that doesn't result in the loop anymore (I still see plenty of BLOCKED events, but always with 100ms in between, so that's probably the next advertisement with a full time slot).

Related