Mesh back and forth seems to break connection

Hi,

We have one customer having two CoAP hosts and some CoAP clients in the form of wireless sensors. The sensors are paired to a single host. The pairing is actually in the app level, where the sensor discovers the network IP of the host in pairing host. All the devices have the same PANID and network key.

Recently we have seen a scenario where some sensors seemingly stopped communication with the paired host. By looking at the RSSI graphs, we thought is this caused by a sensor constantly swinging back and forth between two hosts (one host acting as a router). We dont have access to the CLI interface of the hosts as this is a remote site. We see the Sensor RSSI reported back. This is its RSSI with the router/leader immediately connected to at the time.  

Any ideas?

Cheers,

Kaushalya

Parents Reply
  • Hi Edvin,

    Unfortunately this log was captured after we saw the sensors are no longer sending data. The sensors used to work fine for about 3-4 months. So this is an extreme random and rare case. Now we have set up the debug console to the host radio and the wireshark sniffer permanently in the same place to see if we can capture when it happens.

    Under what conditions the Leader/Router drop a child from its child table?

    Thanks,

    Kaushalya

Children
  • kaushalyasat said:

    Under what conditions the Leader/Router drop a child from its child table?

    It should only be if it stops responding, or if either the child or the parent issues a leave command for the child. I guess you are not intentionally requesting the child to leave, so my guess is that the child becomes unresponsive. Perhaps the application crashes. The logs will probably say something when it happens. You can also test this by simulating this scenario, e.g. by powering off the parent, and monitor the child node's log. 

    BR,
    Edvin

  • Hi Edvin,

    I am trying to implement Child supervision in hope that will help. Unfortunately the documentation is not very clear to me. It is mentioned to enable 'OPENTHREAD_CONFIG_CHILD_SUPERVISION_ENABLE', which is defined in child_supervision.h in SDK. I dont think changing SDK content is a good idea as if the SDK is updated, we will loose all that. 

    1. Is there a way to implement child supervision from application side?

    2. What is to be done from FTD side?

    3. What is to be done from SED side?

    Thanks,

    Kaushalya

  • Hi Edvin,

    We have WDT implemented in the client, so unlikely the sensor crashes. The sensors as they are battery powered devices, have the console disabled. I think we can have special build with consoles enabled just for this.

    Thanks,

    Kaushalya

  • kaushalyasat said:
    I dont think changing SDK content is a good idea as if the SDK is updated, we will loose all that. 

    Most of these are only defined if they are not defined elsewhere:

    /**
     * @def OPENTHREAD_CONFIG_CHILD_SUPERVISION_ENABLE
     *
     * Define to 1 to enable Child Supervision support.
     *
     */
    #ifndef OPENTHREAD_CONFIG_CHILD_SUPERVISION_ENABLE
    #define OPENTHREAD_CONFIG_CHILD_SUPERVISION_ENABLE 0
    #endif
    

    So if you define it somewhere in your project, outside the SDK, that should take precedence. 

    kaushalyasat said:

    2. What is to be done from FTD side?

    3. What is to be done from SED side?

    Do you have logging enabled on any of them?

    kaushalyasat said:
    I think we can have special build with consoles enabled just for this.

    I think so. We need to investigate what's going on. The logs could reveal something.At this point, all we know is that the devices "disappear". Perhaps they are stuck in some weird state / state machine?

    Best regards,

    Edvin

  • Hi Edvin,

    Ok I got your point in Q1. I was not sure the order the #defines are evaluated.

    I have logging available in both SED and FTD builds.

    I had to enable child supervision in Kconfig first.

    I have done this in SED

    #define OPENTHREAD_CONFIG_CHILD_SUPERVISION_ENABLE 1
    #define OPENTHREAD_CONFIG_CHILD_SUPERVISION_CHECK_TIMEOUT 60
    #define OPENTHREAD_CONFIG_CHILD_SUPERVISION_MSG_NO_ACK_REQUEST 0
    and this in the FTD
    #define OPENTHREAD_CONFIG_CHILD_SUPERVISION_ENABLE 1    
    #define OPENTHREAD_CONFIG_CHILD_SUPERVISION_INTERVAL 30
    #define OPENTHREAD_CONFIG_CHILD_SUPERVISION_MSG_NO_ACK_REQUEST 0
    My idea is FTD will send supervision data every 30 sec and SED will initiate a MLE attach if it doesn't get
    this in 60 sec. Am I correct?
    Is there a way for me to drop a child from FTD's child table using CLI? This is for me to check if the MLE reattach works from SED side.
    Results:
    1. When I check "childsupervision checktimeout" from SED, I still get 190, which is default in child_supervision.h. It seems redefinition in my app has no effect. 
    2. I can see the SED in child table of FTD.
    3. I dont see "Sending supervision message to child" from FTD console. 
    4. I dont see any supervision messages from SED.
    Cheers,
    Kaushalya
Related