Mesh back and forth seems to break connection

Hi,

We have one customer having two CoAP hosts and some CoAP clients in the form of wireless sensors. The sensors are paired to a single host. The pairing is actually in the app level, where the sensor discovers the network IP of the host in pairing host. All the devices have the same PANID and network key.

Recently we have seen a scenario where some sensors seemingly stopped communication with the paired host. By looking at the RSSI graphs, we thought is this caused by a sensor constantly swinging back and forth between two hosts (one host acting as a router). We dont have access to the CLI interface of the hosts as this is a remote site. We see the Sensor RSSI reported back. This is its RSSI with the router/leader immediately connected to at the time.  

Any ideas?

Cheers,

Kaushalya

Parents Reply Children
  • Hi Edvin,

    Thanks for your help. Unfortunately, as this is a remote site, we cant do a network sniff easily. This is not a very easily recreatable issue either.

    By 'swinging back and forth' I meant changing between the router and the leader.

    Only thing we see is that the data send by the sensors (end device) is not received by the hosts. We don't know if the network connection is intact or not. We are planning to upgrade the host to retrieve the child table and router tables remotely.

    The nodes are most likely stationary. 

    One thing I forgot to mention is the SDK we use is 2.3.0. 

    Cheers,

    Kaushalya

  • kaushalyasat said:

    By 'swinging back and forth' I meant changing between the router and the leader.

    Does that mean that your devices changes role between router and leader? If that is the case, it sounds like the radio connection is poor. Are the nodes far from one another? Could it be that the device changing between router and leader can only hear the messages from the rest of the network some times? Can you/they try to move the nodes closer, to see if it helps? Alternatively add more routers in between the leader and the node that is popping in and out?

    A sniffer located near the node of the device popping in and out would be able to say something about the received signal strength (RSS).

    Best regards,
    Edvin

  • Hi Edvin,

    Sorry for my late reply.

    The hosts can change role from leader to router and back. We have not done anything to prevent this behavior as this is Thread specific to my knowledge. But as far as I know, this role change only happens when a leader goes offline. I dont think that is the case in this installation. The clients are just SEDs.

    Assume that the two hosts change role from leader/router from time to time, 

    1. What could initiate such a behavior?

    2. Would such a behavior prevent them receiving the messages send by the SEDs? One more thing is that this is a very rare case. We haven't seen this in any of our test sites.

    After the last time, it has not happened again. But I guess that is not a guarantee we have solved the issue. 

    Since this is a remote client site, we have no access to the environment to run a sniffer, unfortunately.

    We are planning to do a F/W upgrade to see the client table and router table from our side. What parameters would you suggest we monitor to get to the bottom of this? (We have a AWS based dashboard, so we can monitor remote parameters, but as this is a limited resource, I want to monitor bare minimum)

    Cheers,

    Kaushalya 

  • kaushalyasat said:
    1. What could initiate such a behavior?

    As you say, a device may become leader if there is no leader present in the network, and this is intentional. However, if you see this behavior while the leader should still be present, it may suggest that your node is out of reach from the rest of the network. May that be the case?

    2: If they are no longer able to communicate with the rest of the network, then you will not be able to receive the messages sent by the other SEDs. That is correct.

    kaushalyasat said:
    We are planning to do a F/W upgrade to see the client table and router table from our side

    As you suggest, a router or client table would give some insight in what nodes that are included in the network when you see this behavior. If the node that is switching between the leader and router is the only node in the network when it is the leader, it suggests that it can no longer communicate with any of the other nodes. If all nodes are still present, apart from the original leader, it would suggest that the leader is no longer able to communicate with the network, and that your node has been promoted to leader.

    Christina Holman said:
    Are nRF Sniffer for 802154 ok?

    Yes. The nRF Sniffer for 802154 is a sniffer that you can use to keep track of messages in your network.

    Best regards,

    Edvin

Related