Mesh back and forth seems to break connection

Hi,

We have one customer having two CoAP hosts and some CoAP clients in the form of wireless sensors. The sensors are paired to a single host. The pairing is actually in the app level, where the sensor discovers the network IP of the host in pairing host. All the devices have the same PANID and network key.

Recently we have seen a scenario where some sensors seemingly stopped communication with the paired host. By looking at the RSSI graphs, we thought is this caused by a sensor constantly swinging back and forth between two hosts (one host acting as a router). We dont have access to the CLI interface of the hosts as this is a remote site. We see the Sensor RSSI reported back. This is its RSSI with the router/leader immediately connected to at the time.  

Any ideas?

Cheers,

Kaushalya

  • kaushalyasat said:
    1. What could initiate such a behavior?

    As you say, a device may become leader if there is no leader present in the network, and this is intentional. However, if you see this behavior while the leader should still be present, it may suggest that your node is out of reach from the rest of the network. May that be the case?

    2: If they are no longer able to communicate with the rest of the network, then you will not be able to receive the messages sent by the other SEDs. That is correct.

    kaushalyasat said:
    We are planning to do a F/W upgrade to see the client table and router table from our side

    As you suggest, a router or client table would give some insight in what nodes that are included in the network when you see this behavior. If the node that is switching between the leader and router is the only node in the network when it is the leader, it suggests that it can no longer communicate with any of the other nodes. If all nodes are still present, apart from the original leader, it would suggest that the leader is no longer able to communicate with the network, and that your node has been promoted to leader.

    Christina Holman said:
    Are nRF Sniffer for 802154 ok?

    Yes. The nRF Sniffer for 802154 is a sniffer that you can use to keep track of messages in your network.

    Best regards,

    Edvin

  • Hi,

    We have managed to find this issue in a test system. Fortunately, we can now run sniffer. We are loosing some customers over this issue. So this is paramount we find the root cause of this.

    What we have see so far is SEDs disconnecting from Leader. When we check the child table, we can see some SEDs have dropped off from it. Now we dont know what caused it. It certainly not a straight off bug, as these sensors have been working for 1-2 months. 

    Out SEDs send the pol period timeout is set to 100. I think this is poll every10sec. RSSI measurement seems ok as well. Why it disconnects from the Leader for 240sec is the question.

    Also another interesting observation we made was that when this disconnection happens, we lost the data from all sensors pretty much the same time. Also when we power cycle the host, all the SEDs start sending data again. This is why we think this is a host side issue. 

    We are organizing a wireshark session for this. I can update more detailed picture, hopefully over the weekend.

    This is a paramount issue for us now as we have lost some clients because of this. We need any help you can provide asap.

    Thanks,

    Kaushalya

  • Under what conditions a Leader/Router drop a child off its child table ? I thought only when no poll for the polling interval from a child and when a child gets connected via a router. But in this instance the router table is empty as well.

  • Can you see in the sniffer trace when this happens? (I can't). Also, does the logs say anything? Either from the devices dropping out or it's parents?

    BR,
    Edvin

Related