This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Group address message crashes all nodes?!

Hi there!

Me again... I'm having issues with networks that consist of more than two nodes. When my network consists of three nodes and one node sends a message to another, the node that wasn't addressed, crashes. The network consists of one nRF52840 DK, one nRF52840 Feather and one nRF52832 DK. They are all running FreeRTOS and are built with ARMGCC on Windows 10.

The same happens when a message is sent to all nodes (0xFFFF); they both crash. I see a pattern, but find it hard to point to a cause. It looks like the node crashes when it receives a message that isn't addressed to exactly and only him. Underneath a snippet of output in the terminal from the server; the logging just stops and FreeRTOS freezes.

<info> app: RX: [aop: 0x00C1]

<info> app: RX: Msg
<info> app: Led is now turned off.
<info> app: TX: [aop: 0x00C4]

<info> app: TX: Msg

Underneath a snippet of the terminal from the client side.

<info> app: TX: [aop: 0x00C1] 

<info> app: TX: Msg
<info> app: Set "off" has been called.
<info> app: RX: [aop: 0x00C1]

<info> app: RX: Msg
<info> app: Led is now turned off.
<info> app: TX: [aop: 0x00C4] 

<info> app: TX: Msg
<info> app: RX: [aop: 0x00C4]

<info> app: RX: Msg
<info> app: Server acknowledged.
<info> app: TX: [aop: 0x00C1] 

<info> app: TX: Msg
<info> app: RX: [aop: 0x00C1]

<info> app: RX: Msg
<info> app: Led is now turned off.
<info> app: TX: [aop: 0x00C4]

<info> app: TX: Msg
<info> app: RX: [aop: 0x00C4]

<info> app: RX: Msg
<info> app: Server acknowledged.

When the client sends a message to specifically a unicast address, the server receives and handles it like below.

<info> app: RX: [aop: 0x00C1]

<info> app: RX: Msg
<info> app: Led is now turned off.
<info> app: TX: [aop: 0x00C4]

<info> app: TX: Msg
<info> app: RX: [aop: 0x00C1]

<info> app: RX: Msg
<info> app: Led is now turned off.
<info> app: TX: [aop: 0x00C4]

<info> app: TX: Msg

Furthermore, this problem also occurs with messages from proxy nodes; getting the ttl value or composition data will result in a crash for all other nodes. I have absolutely no idea as to what might cause the described behaviour. Has anyone got an idea of what might cause the problem? The weird thing is that it worked for a good while until i reset all nodes and provisioned them again.

Thanks in advance and kind regards,

Jochem

Parents
  • Hi,

    Could you give some more details on what you mean by your node "crashing"? Is it a hardfault? app_error_handler? or other resets?

    Also, can you provide which SDK versions you are using?

  • Hi Mttrinh,

    I went back to the first commit I've done and can now say with 100% certainty it has always been this way; I just didn't notice it. Hence, I've come to the conclusion that the FreeRTOS example on GitHub doesn't work with my SDK versions.

    I'll try to revert to V4.1.0 and V16.0.0 to see if the issue is resolved. Gonna take some time though... Have to redo all changes to the SDK's and name conflicts between FreeRTOS and the SDK for Mesh.

    I'll let you know how it goes and if it turns out to be functional.

    Kind regards,

    Jochem

Reply
  • Hi Mttrinh,

    I went back to the first commit I've done and can now say with 100% certainty it has always been this way; I just didn't notice it. Hence, I've come to the conclusion that the FreeRTOS example on GitHub doesn't work with my SDK versions.

    I'll try to revert to V4.1.0 and V16.0.0 to see if the issue is resolved. Gonna take some time though... Have to redo all changes to the SDK's and name conflicts between FreeRTOS and the SDK for Mesh.

    I'll let you know how it goes and if it turns out to be functional.

    Kind regards,

    Jochem

Children
  • It hasn't been clear what is causing the issue but might be something with FreeRTOS and SDK versions like you said. Keep me updated :) 

  • Hi Mttrinh,

    Okay, I'm glad to let you know that it's working as it's supposed to be. None of the above described behaviour occurs anymore. Had to do a rewrite from the bottom up to determine the cause.

    As far as the origin of the problem, I've got some suspicions. In the end, the SDK's didn't turn out to be the problem; everything is working with the most recent (according to the example modified) SDK's. Am very happy with that.

    I've found the following things to cause said behaviour:

    1. The mesh_stack_init_params being defined in a local context resulting in the destruction when the function returns. As is the case when using an object oriented approach and initializing the mesh stack in the constructor.
    2. Using the Idle Handler method instead of the Bearer Event Handler as described in the example. I should note however, that there are a lot of tasks running in our program. It's very plausible the Idle Handler works fine with less demanding tasks.

    To be honest, I think using the SDK with the SDK for Mesh running on FreeRTOS being written in C++ didn't help either. Since it isn't really supported, a mistake or incompatibility can easily slip in...

    All in all I'm glad the problem is solved; it was quite a large one as you can tell. Neither of the ways out (abandoning C++ or FreeRTOS) was an option to me, so it really had to work in the end.

    Anyway, thanks for your support and have a great day!

    Kind regards,

    Jochem

Related