Mesh back and forth seems to break connection

Hi,

We have one customer having two CoAP hosts and some CoAP clients in the form of wireless sensors. The sensors are paired to a single host. The pairing is actually in the app level, where the sensor discovers the network IP of the host in pairing host. All the devices have the same PANID and network key.

Recently we have seen a scenario where some sensors seemingly stopped communication with the paired host. By looking at the RSSI graphs, we thought is this caused by a sensor constantly swinging back and forth between two hosts (one host acting as a router). We dont have access to the CLI interface of the hosts as this is a remote site. We see the Sensor RSSI reported back. This is its RSSI with the router/leader immediately connected to at the time.  

Any ideas?

Cheers,

Kaushalya

  • Hello,

    Can you please try to capture a sniffer trace using the nRF Sniffer for 802154?

    What do you mean by "swinging back and forth between two hosts"? Do you mean that it (an End Device, I assume) keeps changing between two routers?

    Does it ever re-enter the network, or does it disconnect completely?

    Do the nodes move around (physically)? Or are the nodes more or less stationary?

    Best regards,

    Edvin

  • Hi Edvin,

    Thanks for your help. Unfortunately, as this is a remote site, we cant do a network sniff easily. This is not a very easily recreatable issue either.

    By 'swinging back and forth' I meant changing between the router and the leader.

    Only thing we see is that the data send by the sensors (end device) is not received by the hosts. We don't know if the network connection is intact or not. We are planning to upgrade the host to retrieve the child table and router tables remotely.

    The nodes are most likely stationary. 

    One thing I forgot to mention is the SDK we use is 2.3.0. 

    Cheers,

    Kaushalya

  • kaushalyasat said:

    By 'swinging back and forth' I meant changing between the router and the leader.

    Does that mean that your devices changes role between router and leader? If that is the case, it sounds like the radio connection is poor. Are the nodes far from one another? Could it be that the device changing between router and leader can only hear the messages from the rest of the network some times? Can you/they try to move the nodes closer, to see if it helps? Alternatively add more routers in between the leader and the node that is popping in and out?

    A sniffer located near the node of the device popping in and out would be able to say something about the received signal strength (RSS).

    Best regards,
    Edvin

  • Hi Edvin,

    Sorry for my late reply.

    The hosts can change role from leader to router and back. We have not done anything to prevent this behavior as this is Thread specific to my knowledge. But as far as I know, this role change only happens when a leader goes offline. I dont think that is the case in this installation. The clients are just SEDs.

    Assume that the two hosts change role from leader/router from time to time, 

    1. What could initiate such a behavior?

    2. Would such a behavior prevent them receiving the messages send by the SEDs? One more thing is that this is a very rare case. We haven't seen this in any of our test sites.

    After the last time, it has not happened again. But I guess that is not a guarantee we have solved the issue. 

    Since this is a remote client site, we have no access to the environment to run a sniffer, unfortunately.

    We are planning to do a F/W upgrade to see the client table and router table from our side. What parameters would you suggest we monitor to get to the bottom of this? (We have a AWS based dashboard, so we can monitor remote parameters, but as this is a limited resource, I want to monitor bare minimum)

    Cheers,

    Kaushalya 

Related