This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Thread Dynamic Multiprotocol - Assertion at *RadioReceive*

While testing and debugging our current state of development, sometimes our devices go totally offline (thread visible via mqtt-sn last will, no ble advertising).

I found out that the assert after RadioReceive in Openthreads mac.cpp -> Mac::BeginTransmit loops forever (not Watchdog active yet, no special assertion handling).

Digging stil a bit deeper, it seems that the call to critical_section_enter in nrf_802154_core_receive seems to be the source of this behaviour.

As far as i can tell, this means, that Thread isn't allowed at that time (due to BLEs priority) to switch into receive mode and this causes the assertion.

So just for testing I changed the advertising interval from 100ms to 20ms and the Thread poll-period from 1000ms to 150ms to provoke this --> and it happens far more often (about ever 10 to 15 minutes; bevore the changes it happened about once a day or so).

So (finally) my question:

What's the proper way to solve this?

Change the assetion behaviour to a Thread-Softreset or SystemReset? (basically implementing the otPlatAssertFail function)

Make deep changes in Openthread?

Any other ways / suggestions?

OpenThread was compiled from master-branch today (commit-id: c4f44ae0cae10fb09990435a7e74024b9717dd4b), Program based on the dynamic multiprotocol proximity example, but switched out the BLE service(s) and added the mqtt-sn implementation.

Thanks for your help

Marco

Parents
  • What just came to my mind now - I'm using the internal 32.768 RC instead of an external one - maybe this increases the probability of this behaviour due to the higher tolerances.

  • Hey Marco, sorry for the late reply.

    Use the internal RC oscillator can cause the BLE and/or thread stack to miss their TX/RX windows, due to inaccurate schedulers. The SoftDevice can mediate this issue by accounting for this inaccuracy, but you need to set the clock accuracy to 500ppm when configuring the SoftDevice Handler(NRF_SDH_CLOCK_LF_ACCURACY), and set the LFRC calibration interval to ~15 seconds(NRF_SDH_CLOCK_LF_RC_CTIV). 

    I suggest you try setting the LFRC accuracy to 500ppm an/or using an external crystal and if that does not work then we will dig a bit deeper into the radio stacks. 

    Cheers,

    HĂĄkon.

  • Hello Lukasz,

    sorry, the commit mentioned above was from our internal fork (we changed the MAX_CHILD property to 32 for our tests). The 'real' opentread commit id is c4f44ae0cae10fb09990435a7e74024b9717dd4b

    Sorry for the confusion and thank you for your efforts.

  • Hello Marco, congratulation, it seems you have found a bug!

    Unfortunately our long stress tests did not cover this. We are going to work on final fix which will be contributed to nRF IEEE 802.15.4 radio driver soon, and most likely released soon in minor version of SDK.

    In meanwhile i have a request to you to test a temporary workaround. Could you please modify this line:

    https://github.com/openthread/openthread/blob/master/examples/platforms/nrf52840/radio.c#L329

    to something like that:

    The idea of workaround is to try to call nrf_802154_receive() function until it succeeds. The downside of this solution is that once/twice per day MCU will need to hang for maximum of 2-5ms.

  • Hello Lukasz,

    thank you very much for the workaround - I will test it asap (the time hanging in there won't be a problem for now).

    And glad to hear that I found a *real* bug ;-)

    Should I keep this open until the final fix is available?

  • Hello Marco,

    Can you please try the current master branch? Fix has been just merged there.

  • Hello Lukasz,

    sorry for the very late answer, but I wasn't able to test this for a while (and the workaround did really well).

    But finally I can say - this solves it as well for us!

    Thank you very much for your efforts!

    Cheers

    Marco

Reply
  • Hello Lukasz,

    sorry for the very late answer, but I wasn't able to test this for a while (and the workaround did really well).

    But finally I can say - this solves it as well for us!

    Thank you very much for your efforts!

    Cheers

    Marco

Children
No Data