This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Thread Dynamic Multiprotocol - Assertion at *RadioReceive*

While testing and debugging our current state of development, sometimes our devices go totally offline (thread visible via mqtt-sn last will, no ble advertising).

I found out that the assert after RadioReceive in Openthreads mac.cpp -> Mac::BeginTransmit loops forever (not Watchdog active yet, no special assertion handling).

Digging stil a bit deeper, it seems that the call to critical_section_enter in nrf_802154_core_receive seems to be the source of this behaviour.

As far as i can tell, this means, that Thread isn't allowed at that time (due to BLEs priority) to switch into receive mode and this causes the assertion.

So just for testing I changed the advertising interval from 100ms to 20ms and the Thread poll-period from 1000ms to 150ms to provoke this --> and it happens far more often (about ever 10 to 15 minutes; bevore the changes it happened about once a day or so).

So (finally) my question:

What's the proper way to solve this?

Change the assetion behaviour to a Thread-Softreset or SystemReset? (basically implementing the otPlatAssertFail function)

Make deep changes in Openthread?

Any other ways / suggestions?

OpenThread was compiled from master-branch today (commit-id: c4f44ae0cae10fb09990435a7e74024b9717dd4b), Program based on the dynamic multiprotocol proximity example, but switched out the BLE service(s) and added the mqtt-sn implementation.

Thanks for your help

Marco

Parents Reply Children
  • Actually there was no connection active in my test - th edevice was only advertising with a 16 Byte full name and the flags in advertising and additionaly a 128 Bit Service UUID in scan-response. The advertising interval was 100ms and for testing I decreased it to 20ms which lead to a worse behaviour (the error occured faster).

    I currently don't have any BLE trace of the application.

  • Hello Marco,

    First of all, we have released yesterday the new version of nRF5 SDK for Thread and Zigbee 2.0.0. I encourage you to try it out. It contains fixes for RC oscilator in multiprotocol solution.

    Can you elaborate more on the problem that you see.I'm confused if you see an assert or you identified infinite loop. Can you point me the file and line of the code where it asserts?

    Advertising interval of 20ms seems to be to aggressive, i suggest you to change it again to 100ms.

  • Helo Lukasz,

    thanks for the hint with the new sdk - I will definitely try it asap (hopefully next week or so).

    To the problem at hand (openthread commit I'm looking at is 71e873e2e43ee08a01547839698c602abd3973e6):

    The assertion happens in in mac.cpp of openthread in the Function BeginTransmit at line 1229 after the call to RadioReceive.

    When digging deeper and deeper into RadioReceive I ended in nrf_802154_core.c in the function nrf_802154_core_receive which returned at the evaluation of the result from critical_section_enter.

    Basically this whole handling seems fine to me, but my problem is the handling of assert - otPlatAssertFail isn't implemented as far as I can tell (so it falls back to the endless while-loop in debug.hpp) and I'm absolutely not sure how to implement it there in a way that will work fine for this event without interfering with all the other stuff.

    Hope this narrows it a bit down for you.

    As for the advertising interval - we won't use such a small one in the final application - this was just for testing and trying to reproduce this issue faster - and it dose it's job ;-)
    The final application will advertise closer the 1000ms or something like that - so the probability of this happening will get smaller, but I'd like it to be eliminated ;-)

  • Hello Marco,

    Thank you for detailed answer. We are currently looking at this problem. Unfortunately i cannot find OpenThread commit which you listed above.

  • Hello Lukasz,

    sorry, the commit mentioned above was from our internal fork (we changed the MAX_CHILD property to 32 for our tests). The 'real' opentread commit id is c4f44ae0cae10fb09990435a7e74024b9717dd4b

    Sorry for the confusion and thank you for your efforts.

Related