This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Scheduler get full by NRF_EVT_RADIO_BLOCKED events during connect.

Using:

  • nRF52832
  • SDK 15.3.0
  • SD S132 6.1.1
  • nRF5 SDK for Mesh 3.2.0
  • NRF_SDH_DISPATCH_MODEL == NRF_SDH_DISPATCH_MODEL_APPSH

Hi, I'm often getting a crash when connecting. The back trace showed hints of it happening in the mesh code: timeslot.c handling event NRF_EVT_RADIO_BLOCKED.

But later I found out it actually happens because the app scheduler is full, while nrf_sdh.c tries to add a SD event on the scheduler: ret_code_t ret_code = app_sched_event_put(NULL, 0, appsh_events_poll);

When logging SOC events, I noticed many NRF_EVT_RADIO_BLOCKED events coming in when a connection is made. And when inspecting what happens, I saw that timeslot.c simply keeps on requesting a new timeslot until it succeeded.

Am I right to assume this timeslot request loop can lead to the scheduler getting filled up (especially when other processes also add to the scheduler)? And if so, is there a solution for this?

  • Hi,

    Am I right to assume this timeslot request loop can lead to the scheduler getting filled up (especially when other processes also add to the scheduler)?

    Yes, I think that can be the case.

    When logging SOC events, I noticed many NRF_EVT_RADIO_BLOCKED events coming in when a connection is made.

    Not sure what you mean by connect? Assuming your application are only running ble mesh, there is no concept of connections in a mesh network.

    Could you provide more information regarding your use-case and what you are trying to do?

    Could it be related to interrupt priority levels?

    Also, is there a reason you are using an old SDK version? If possible I would suggest you move to our latest SDK(nRF5 for Mesh SDK v5.0.0), as many improvements have been made from v3.2.0 to v.5.0.0.

  • Not sure what you mean by connect? Assuming your application are only running ble mesh, there is no concept of connections in a mesh network.

    Could you provide more information regarding your use-case and what you are trying to do?

    A normal BLE connection with a phone for example, either as peripheral or as central. The mesh is used to communicate between nodes.

    Could it be related to interrupt priority levels?

    I followed that guide and run all code at thread level: using NRF_SDH_DISPATCH_MODEL_APPSH, and init the mesh with NRF_MESH_IRQ_PRIORITY_THREAD.

    Also, is there a reason you are using an old SDK version? If possible I would suggest you move to our latest SDK(nRF5 for Mesh SDK v5.0.0), as many improvements have been made from v3.2.0 to v.5.0.0.

    I checked timeslot.c at the 5.0.0 mesh SDK, and that handles the NRF_EVT_RADIO_BLOCKED event in exactly the same way. I didn't have any reason to update SDK version yet, and usually a new SDK requires more resources.

    I'm not too sure anymore about NRF_EVT_RADIO_BLOCKED being the source of the problems, I noticed that app_sched_execute() does finish in between the events.

    When inspecting the app scheduler queue on crash, I noticed it's full of calls to appsh_events_poll(). A function that gets and dispatches all pending events from the SD. Wouldn't it be a good idea to check whether this call isn't already queued in the scheduler before putting it in there?

  • I did some more research, and it's best reproducible with a connection as central.

    Before making a connection, I stop the mesh with nrf_mesh_disable(). If i don't stop the mesh, connecting takes very long.

    Most times, the mesh actually stops, and i get events NRF_EVT_RADIO_SESSION_IDLE and NRF_EVT_RADIO_SESSION_CLOSED.

    Sometimes, however, the mesh doesn't seem to stop, that's when i get multiple NRF_EVT_RADIO_BLOCKED events. What's worse, suddenly the calls to nrf_mesh_on_sd_evt() with the event NRF_EVT_RADIO_BLOCKED take 15ms.

    Does the softdevice take up all this CPU time? Or can the call sd_radio_request() take so much time because something in the SD is blocking?

    Is there a better way to stop the mesh? It shouldn't be making new timeslot requests when i called nrf_mesh_disable().

  • After some more testing, I found out that it seems like sd_radio_request() sometimes is a blocking call, which blocks for p_request->earliest->timeout_us micro seconds, when the SD is connecting as central. Or maybe it's always blocking, and normally returns quickly. After that time, the NRF_EVT_RADIO_BLOCKED event is triggered, and since the main thread is still in the nrf_sdh_soc_evts_poll() loop, it will immediately process this event, leading to another sd_radio_request().

    I do not know how to properly fix this, the cpu shouldn't be blocked for that long.

  • Hi,

    This have been sent to our Mesh team to have look at, I will update you once I have a response from them.

Related