This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nRF Mesh SDK Race Conditions

Hello,

During development with the Mesh SDK, we realized that it would not be possible to use a vast number of functions, even those relating to message publishing, without incurring in the potential for race conditions, unless synchronization with mesh processing is enforced. The problem also precludes heap usage without said considerations.

We have opted to perform such synchronization by fully disabling mesh processing before calling most mesh-related functions.

The SDK documentation discourages operation outside the priority used by mesh processing, however, we would like to know whether the above-mentioned approach presents any disadvantage.

I would also recommend further highlighting the issue within the documentation, as well as specifying it may also affect the standard library dynamic allocation functions.

Thanks in advance,
Alberto

  • Hi,

    Can you elaborate a bit on what potential race conditions you are thinking of?

    Also, what you mean by "fully disabling mesh processing" and before what set of functions?

    I want to better understand the problem before discussing with our Mesh team what can be done about it.

    Regards,
    Terje

  • I will gladly elaborate,

    By “fully disabling mesh processing” I mean disabling the mesh processing interrupt, which is the QDEC interrupt or SWI0, depending on the configuration, when running the stack in interrupt mode. I have also been using FreeRTOS and running nrf_mesh_process() in a task – in that case I disable mesh processing using a mutex.

    I disable mesh processing at mesh start-up (same as the examples) and when calling message-publishing functions, which are the only functions in the Mesh SDK invoked in the main application. The idea is to do so before any SDK function.

    The potential race conditions are quite broad, since most of the mesh state is stored in static variables. From a brief analysis, I do believe it would be possible, for example, to create issues in the packet buffer by sending a message without synchronizing accesses. A more notable case is stdlib.h dynamic allocations, which normally require locking for multi-threaded use (and, thus, use within ISRs).

    A general warning against these issues is provided in the “Interrupt priority levels” of the Mesh SDK documentation.

    My question is whether the stated approach is adequate, since having to use a specific priority to trigger most mesh-related calls, especially in the interrupt-based mode, requires quite a complicated approach in most non-trivial applications.

    Thanks again,
    Alberto

  • Hi,

    Thank you for the explanations. I have asked the mesh team for input, and I will get back to you when they have responded. Please note however that the nRF SDK for Mesh was not written with thread safety in mind, and we cannot give any definite answers without actually making (and testing) a thread safe release. Currently, unfortunately, we do not have any such plans.

    Regards,
    Terje

  • Hello,

    Thank you for taking the time to look into this. I should remark that, as would be expected, these issues do present themselves just the same even when using the SDK in its intended interrupt-driven mode, if one should wish to call most SDK functions outside the priority of said interrupt.

    Best regards,
    Alberto

  • Hi,

    Yes, I agree that the issues you are facing are the same ones as if one wants to call the functions outside of the required priority level.

    As expected the Mesh team didn't have any specific suggestions or definite recommendations, as the nRF5 SDK for Mesh was not intended to be used with FreeRTOS.

    As long as the mesh API and the handling of the mesh stack happens as if it was running in the same interrupt context, then at least in theory you should be fine. It would be interesting to know how well your suggested approach works in practice.

    Regards,
    Terje

Related