Mesh back and forth seems to break connection

Hi,

We have one customer having two CoAP hosts and some CoAP clients in the form of wireless sensors. The sensors are paired to a single host. The pairing is actually in the app level, where the sensor discovers the network IP of the host in pairing host. All the devices have the same PANID and network key.

Recently we have seen a scenario where some sensors seemingly stopped communication with the paired host. By looking at the RSSI graphs, we thought is this caused by a sensor constantly swinging back and forth between two hosts (one host acting as a router). We dont have access to the CLI interface of the hosts as this is a remote site. We see the Sensor RSSI reported back. This is its RSSI with the router/leader immediately connected to at the time.  

Any ideas?

Cheers,

Kaushalya

Parents
  • Hello,

    Can you please try to capture a sniffer trace using the nRF Sniffer for 802154?

    What do you mean by "swinging back and forth between two hosts"? Do you mean that it (an End Device, I assume) keeps changing between two routers?

    Does it ever re-enter the network, or does it disconnect completely?

    Do the nodes move around (physically)? Or are the nodes more or less stationary?

    Best regards,

    Edvin

  • Hi, We continue to see these SEDs failing over time. I have 4 SED now in the lab which seemingly 'fell off' the network. I dont have any wireshark captures of them before this happening, but I have them after. 

    1. To filter out wireshark captures, I want to filter based on Extended MAC of the SEDs. But I cant seem to find any field in the captured packets which contains the Ext MAC. Is there a way to target the ExtMAC of SEDs from wireshark?

    2. From one of these fallen off SEDs, I can see the RTT viewer (console is disabled in SEDs). By that I can see the SED apparently sends data out, but I cant see these packets from wireshark. I have filtered based on RLOC16 of the destination server, which should be receiving these packets. As I dont know a way to filter based on ExtMAC of the SED I cant directly target the SED. I dont know the RLOC16 of the SED. At the moment, we have many CoAP hosts in the lab, so I dont know the path this SED has taken as well. Can you give me a way how to find the packets send by this SED from wireshark? I only know the ExtMAC of the SED. 

    Any help is much appreciated.

    Cheers,

    Kaushalya  

  • You can apply any field that you see in a packet as a filter. Just right click it, select "Apply as Filter" -> "Selected":

    This will paste this as a filter in the top of WireShark. You can also use logic expressions, like || and && : 

    However, if you can't find that field in the packet, it is not part of the packet itself, and in that case, the filter will not be able to pick it up (it will be filtered out). 

    I would assume the information is present in the trace. Can you upload it? Does it contain the data all the way from the start? There must be some packets where the RLOC address is assigned to the node.

    BR,
    Edvin

  • Hi Edvin,

    Apr-16-1.pcapng

    This is one of the logs. 

    1. How can I filter RLOC address assignment packets? 

    2. In SED to FTD transmissions, I cant seem to decrypt 802.15.4 packets but FTD to FTD packets are fully decryptable. 

    How can I fix this? I have only one networkkey, which is 0x00112233445566778899aabbccdd0001

    Cheers,

    Kaushalya

  • Also I came across this thread from an old devzone ticket.

     nRF Sniffer integration for 802.15.4 in a python scipt (Pcap file problems) 

    Here Nordic engineer mentions that extended address is required in a packet to decrypt and this can be 'fixed' by moving packets with extended address to the top. I tried doing this with the attached log, but I am not 100% sure how to do it. 

    Can you shed some light?

    Thanks,

    Kaushalya

  • Also from the log I think I can see that the sensors (SEDs) which has disappeared from the network actually is connected to the network from the sensors point of view. I can see my log message just before  calling the coap_send_request (). Following is the code section.

    ...
    LOG_INF ("ZS %d, RSSI %d, LQI %d, LQO %d, FW %04x", the_sensor_device->zoneState, RSSI, linkQalIn, linkQualOut, FWRevNum);
    ...
    coap_send_request(COAP_METHOD_PUT, (const struct sockaddr *)&unique_local_addr, sensor_option, payload, sizeof(payload), NULL);
    ...
    

    Here I have not handled the return value from coap_send_request (), which is my bad. But I dont get any error logs from this. But I cant see the packet being transmitted in my wireshark logs. So I have the feeling that this is something related to either CoAP stack or the OpenThread stack. 

    There could be possibility that I dont see the actual packet in wireshark log because I dont have sufficient data to filter these. As I said, I dont know the RLOC of these sensors, I only know their MAC. But I cant easily filter based on the MAC as I cant see it in any frames.

    How can we further debug this?

    Cheers,

    Kaushalya

Reply
  • Also from the log I think I can see that the sensors (SEDs) which has disappeared from the network actually is connected to the network from the sensors point of view. I can see my log message just before  calling the coap_send_request (). Following is the code section.

    ...
    LOG_INF ("ZS %d, RSSI %d, LQI %d, LQO %d, FW %04x", the_sensor_device->zoneState, RSSI, linkQalIn, linkQualOut, FWRevNum);
    ...
    coap_send_request(COAP_METHOD_PUT, (const struct sockaddr *)&unique_local_addr, sensor_option, payload, sizeof(payload), NULL);
    ...
    

    Here I have not handled the return value from coap_send_request (), which is my bad. But I dont get any error logs from this. But I cant see the packet being transmitted in my wireshark logs. So I have the feeling that this is something related to either CoAP stack or the OpenThread stack. 

    There could be possibility that I dont see the actual packet in wireshark log because I dont have sufficient data to filter these. As I said, I dont know the RLOC of these sensors, I only know their MAC. But I cant easily filter based on the MAC as I cant see it in any frames.

    How can we further debug this?

    Cheers,

    Kaushalya

Children
  • kaushalyasat said:
    Can you shed some light?

    I believe the takeaway from this thread is that if the sniffer didn't pick up the packets where the nodes joined the network (for the first time), so that it uses it's extended address and was assigned a short address, the sniffer doesn't know how to map the short RLOC16 addresses to the extended addresses, and hence, it can't decrypt the packets to/from these devices. 

    kaushalyasat said:
    Here I have not handled the return value from coap_send_request (),

    So is it possible to check these return values? If you don't see the packets, it may mean that they are never sent. And if that is the case, then a clue is probably found in the return value from this function. 

    kaushalyasat said:
    How can we further debug this?

    Check the return value for coap_send_request() when the packets aren't sent correctly. 

    Then, try to reset the entire network, or at least factory reset the sensor devices, so that they do a new provisioning sequence when you turn them on. Then you need to enable the sniffer before you provision the devices, so that the sniffer can pick up the extended addresses being used before they are assigned an RLOC16 (short) address. You can experiment with this in a small scale in your office. Set up a small network with two devices, and try starting the sniffer before the provisioning process, and after, and compare the results. 

    Best regards,

    Edvin

  • Hi Edvin,

    I am building another FW with return values gets logged. Unfortunately we have to wait until this problem creeps up again since we dont know how to recreate it. 

    Is there a way to rearrange wirechark packets so that we can push a MLE packet up? Then the subsequent logs can get extended address from that.

    Also as I mentioned earlier, we dont see any other error messages in my RTT viewer. When I look into 'coap_send_request (...)', it has 'coap_init_request (...)' and 'coap_send_message(...)' functions. Both these has error logs printed if something goes wrong. So can we assume that there were no errors reported during a packet transmission? 

    how to reset the network? What APIs should I use?

    Cheers,

    Kaushalya

  • Hello Kaushalya,

    I don't think it is possible to rearrange the packets like this, no. If anything, I think you woudl have to edit the raw .pcapng file. 

    kaushalyasat said:

    Also as I mentioned earlier, we dont see any other error messages in my RTT viewer. When I look into 'coap_send_request (...)', it has 'coap_init_request (...)' and 'coap_send_message(...)' functions. Both these has error logs printed if something goes wrong. So can we assume that there were no errors reported during a packet transmission? 

    That depends on the implementation. What error messages did you see? Are they printed from within the implementation of coap_send_message()? Or from the function that checked their return value?

    You also need to consider the possibility that for some reason these functions aren't called at all, due to some other error.

    Best regards,

    Edvin

  • Hi Edvin,

    I don't think it is possible to rearrange the packets like this, no.

    As I mentioned, in this thread, one of Nordic engineers mentioned something like that -  nRF Sniffer integration for 802.15.4 in a python scipt (Pcap file problems) 

    Also I can see in wireshark, you can  time shift packets around like this.

    I tried doing this, but couldn't see any effect. I am not sure my operation was correct, so I leave this for a wireshark guru to comment.

    What error messages did you see?

    What you mean is error messages I saw in the RTT viewer? I didn't see any.

    Are they printed from within the implementation of coap_send_message()?

    Yes. If you look at coap_send_request (...) function in coap_utils.c, you can see that coap_init_request (...) is logged from within and coap_send_message (...) return is logged. I didn't go any deeper as I got stuck in z_impl_zsock_sendto (...) in sockets.c.

    You also need to consider the possibility that for some reason these functions aren't called at all, due to some other error.

    static void send_sensor_update (struct k_work *item) {
        .
        .
        .
        
    	LOG_INF ("ZS %d, RSSI %d, LQI %d, LQO %d, FW %04x", the_sensor_device->zoneState, RSSI, linkQalIn, linkQualOut, FWRevNum);
    
    	memcpy (&payload[1], myExtAddr.m8, 8);
    	memcpy (&payload[9], myEUI64.m8, 8);
    
    	payload[17] = ((the_sensor_device->temp)>>8) & 0xff;
    	payload[18] = (the_sensor_device->temp) & 0x00ff;
    
    	payload[19] = ((the_sensor_device->vbat)>>8) & 0xff;
    	payload[20] = (the_sensor_device->vbat) & 0x00ff;
    	payload[21] = RSSI;
    	payload[22] = linkQalIn;
    	payload[23] = linkQualOut;
    	payload[24] = the_sensor_device->zoneState;
    	payload[25] = (uint8_t)(FWRevNum >> 8);
    	payload[26] = (uint8_t)(FWRevNum & 0x00ff);
    
    	ARG_UNUSED(item);
    
    	if (net_ipv6_is_addr_unspecified (&unique_local_addr.sin6_addr)) {
    		LOG_WRN("Peer address not set. Activate 'provisioning' option on the server side");
    		return;
    	}
    
    	coap_send_request(COAP_METHOD_PUT, (const struct sockaddr *)&unique_local_addr, sensor_option, payload, sizeof(payload), NULL);

    This is the code section from the log print to coap_send_request (...) in my code. Can you think of any ways that coap_send_request (..) may not have been called? if IP6 is missing, I would get a LOG_WRN, which I dont get.
    Cheers,
    Kaushalya

  • kaushalyasat said:
    When I look into 'coap_send_request (...)', it has 'coap_init_request (...)' and 'coap_send_message(...)' functions. Both these has error logs printed if something goes wrong

    I was thinking about these. What error logs do you refer to?

    kaushalyasat said:
    Can you think of any ways that coap_send_request (..) may not have been called?

    That would be if send_sensor_update() is not called. Do you have something indicating whether or not these are called at the time when the devices become unavailable?

    I am sorry, but we are several months into this, and I am not quite sure what we are discussing anymore. You have some devices in a remote area that you do not have physical access to where you see some devices drop out from time to time, right? Perhaps you can try to replicate this in a local area where you have access to your devices? Reset the entire network, start sniffing before you start your devices so that the sniffer can capture everything from the beginning. Then it should be able to pick up and resolve all the short addresses. When you detect the issue, look into the log from that particular device. Does it say anything when trying to call coap_send_request()? Any error messages? coap_send_request() also returns a value based on how it did. It returns 0 on success, and a negative number on failure. Try printing something in the log in the cases where this returns < 0. What does it return when it fails?

    Best regards,

    Edvin

Related