Zigbee network join fails

I have an issue that I don't know how to debug. I take the latest version of Zigbee Light Bulb example code and the only change I make is add CONFIG_ZIGBEE_CHANNEL_SELECTION_MODE_MULTI=y. Then I run this code on the nRF52840-DK and try to join a Zigbee network. This works if I use a Conbee 2 Zigbee coordinator. If I use a Conbee 3 coordinator the join fails. Looking at the output from the device I see that it chooses to leave the network. It doesn't give a reason and just says I: Network steering was not successful (status: -1). On the coordinator side I see that some information has been changed but it is always a little different but always fails. The same coordinators can be used with other Zigbee devices without issues. 

I have set up Wireshark and I can see the packets going back and forth on channels 25. The join always seems to fail after some time after Transport Key exchange. But the key itself seems to be ACKed by the Light Bulb. Just a little after that the Light Bulb decides to leave the network and give up.

How should I proceed from here. I need my device to work with all (or most) standard Zigbee networks. 

Parents
  • Hello,

    Comparing the sniffer logs for the two coordinators can help to find out what is different for the ConBee 2 and ConBee 3. You can also share the logs with us in a .pcap format so we can examine it. If you are using a different network key than in the nRF Connect SDK samples or the Zigbee Alliance default key, please share your network key so we can decrypt the packets. 

    Best regards,

    Maria

  • Hello Maria,

    I have attached two pcap files to show the two different conditions. I didn't change any keys anywhere so I hope these are default. Let me know if there is anything else I can try to help get to the bottom of this. I did these captures with the Light Switch example as I saw that the Light Bulb example complained about unhandled signals (Permit Join signal I think, because it is a router role probably).

    conbee_3_join_fail.pcap

    conbee_2_join_ok.pcap

  • Hello,

    Thank you for the logs. They seem to indicate that your Zigbee device is acting as a legacy device. Which is odd, considering that you are using the latest version of Zigbee: Light Switch. NCS v2.6.0 uses ZBOSS 3.11.3.0 which implements the stack based on the Zigbee 3.0 specification set.

    When researching the ConBee III I saw that it supports Zigbee 3.0. There are some pieces which matches the sniffer log when the sniffer log is missing a Request Key packet and that the ConBee III supports Zigbee 3.0 specifically.

    I have asked a colleague for a second opinion about the logs.

    I am taking some time off work next week, so this ticket could be transferred to another engineer. Please also note that next week we have Easter Holiday here in Norway which includes lower staffing and some days with no staffing. Thank you for your patience.

    Best regards,

    Maria

  • Hi Maria,

    We have Easter Holidays in Estonia as well. But since that is only at the end of the week I hope we can move forward at least a little bit before that.

    Is there a way to force the device to not use legacy mode? I really need to find a way to make this work as the issue seems to affect not only Conbee 3 specifically but many of the newer EFR32MG21 based Zigbee coordinators. I'm not familiar which firmware these all use but for my use case Zigbee needs to work with these quite popular devices. 

    I have tried both the light bulb and the light switch samples from the latest version and also from our current production version (v2.5.0) and all have the same issue. I don't know how to debug the Zigbee code and at the moment there's really nothing that I can think of to fix this myself.

    Tiit

  • Hi Maria,

    Has there been any progress on this? Is there anything I can do on my end to investigate this issue further? At the moment this seems to affect not just the Conbee 3 but other Zigbee hubs as well. I'm buying as many different commonly used Zigbee coordinators as I can find and it looks like half of them have a similar issue. From the ones I have looked at in more detail it seems that the failingones all use the same Silicon Labs SOC family. But as far as I understand Zigbee should be compatible between different manufacturers.

    Tiit

  • Hi Tiit,

    First I want to correct my previous statement. The Nordic device is not the device which is acting as a legacy device, the ConBees are. I misinterpreted the sniffer on my first go.

    I have gotten the second opinion on your sniffer logs and they are as follows:

    There are issues with the packet flow for the Trust Center link key exchange for both ConBee II and Conbee III. I will give some details for both of the versions.

    The communication flow does not follow the communication flow for Zigbee Pro, which the Beacon from the ConBee II lists that it has as its Stack Profile. See the screenshot for packet 8:

    The next step in the communication is for the joining device to broadcast Device Announce, which is done twice: 17 and 23. The Node Descriptor Request is sent as packet 62 and the Node Descriptor Responce is received as packet 70. Looking at the packet details for the response we can see that the Stack Compliance revision is 0, which indicates that it is not Zigbee Pro.

    The connection succeeds in this case because a Zigbee Pro device is not required. But the Trust Center link exchange is stopped because of differing versions.

    For the ConBee III it is more difficult give many details because the communication stops before the Node Descriptor Request is sent. From the sniffer log it looks like the ConBee III sends a Transport Key to the Nordic device before the device requests it and this causes the device to leave the network.

    Based on the information I found about ConBee III it should support Zigbee Pro, and so should the joining device with a Zigbee sample from NCS v2.6.0. I will study this some more, and ask for some more assistance from my colleague to get somewhere quick with this.

    Best regards,

    Maria

Reply
  • Hi Tiit,

    First I want to correct my previous statement. The Nordic device is not the device which is acting as a legacy device, the ConBees are. I misinterpreted the sniffer on my first go.

    I have gotten the second opinion on your sniffer logs and they are as follows:

    There are issues with the packet flow for the Trust Center link key exchange for both ConBee II and Conbee III. I will give some details for both of the versions.

    The communication flow does not follow the communication flow for Zigbee Pro, which the Beacon from the ConBee II lists that it has as its Stack Profile. See the screenshot for packet 8:

    The next step in the communication is for the joining device to broadcast Device Announce, which is done twice: 17 and 23. The Node Descriptor Request is sent as packet 62 and the Node Descriptor Responce is received as packet 70. Looking at the packet details for the response we can see that the Stack Compliance revision is 0, which indicates that it is not Zigbee Pro.

    The connection succeeds in this case because a Zigbee Pro device is not required. But the Trust Center link exchange is stopped because of differing versions.

    For the ConBee III it is more difficult give many details because the communication stops before the Node Descriptor Request is sent. From the sniffer log it looks like the ConBee III sends a Transport Key to the Nordic device before the device requests it and this causes the device to leave the network.

    Based on the information I found about ConBee III it should support Zigbee Pro, and so should the joining device with a Zigbee sample from NCS v2.6.0. I will study this some more, and ask for some more assistance from my colleague to get somewhere quick with this.

    Best regards,

    Maria

Children
  • Hi Maria,

    I will order a few more common versions and I will try to create similar pcap files for those as well. At the moment I can confirm that the samples work with the Conbee II and a Smartthings hub but seem to fail with Conbee III, Home Assistant SkyConnect and a Nortek HUSBZB-1. I'm going to get hubs that seem to be popular with Home Assistant users as they seem to represent a good portion of our customers and that is the only statistic available to me at the moment.

    Tiit

  • I have done another round of packet captures and added the serial log for each of those. This time I used the Lightbulb sample and also used a separate USB cable to power the SOC (SW9 in USB position and a cable connected to nRF_USB). I'm still seeing very weird behaviour with the Conbee controllers but others seemed to work much better this time. I also brought the controller sticks further away from the PC with a USB extension cable. The way the errors are so consistent with the Conbee controllers leads me to believe this is not a power or signal integrity issue. For some reason the others that failed before did start working now. 

    4300.logs.zip

    I'm not sure why but powering the dev board like I did this time caused the leds on the DUT to glow only very dimly. I'm attaching a photo of that but in reality they are even dimmer.

    Tiit

  • Hi Tiit,

    Thank you for your significant patience here.

    These are my findings from your logs:

    The devices which are Zigbee PRO compliant are Conbee 3 (r21), SONOFF P/E (r22), and SkyConnect (r22). HUSBZB-1 and Conbee 2 are not Zigbee 3.0 compliant. For HUSBZB-1 this is because the Zigbee component is EmberZNet 5.4, which is r20 compliant. Sources: HUSBZB datasheet and  Mapping EmberZNet table. Conbee 2 lists Zigbee 3.0 as supported, but from the sniffer log this seems wrong. Sources: https://www.phoscon.de/en/conbee2/techspec and the lack of key exchange steps in your sniffer log for Conbee 2.

    Of the four Zigbee 3 compliant coordinators, SONOFF P/E and SkyConnect follows the expected commissioning steps. Conbee 3 does not, specifically: during the key exchange sequence there is no key confirmation after key transport, which results in the light bulb leaving the network.

    From these findings I advise you to use the SONOFF or SkyConnect coordinators or an NCS coordinator. I understand that you want your device to work with as many commercial coordinators as possible, but if the commercial coordinator deviates from the specification there are limits to what we can do to circumvent that.

    Regarding the powering of your DK/SoC

    See the following part of the nRF52840 DK for information on how the direct SoC power supply should be used: nRF52840 SoC direct supply.

    Best regards,

    Maria

Related