This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Coordinator Assert during commission of 24 devices

nRF Connect V 1.6.0

using samples/zigbee/network_coordinator against a nRF52840, I am trying to commission a larger number of devices to join the network. Currently 24 devices, but this happens at lower numbers of 11 devices as well.

enabled the CLI - and performed the "bdb start" to begin commissioning:

uart:~$ I: Network formed successfully, start network steering (Extended PAN ID: f4ce36628124d7a9, PAN ID: 0x 4d03)
I: Device update received (short: 0xb83a, long: f4ce363045dbb779, status: 1)
I: Device update received (short: 0x7dce, long: f4ce36ddabae0f04, status: 1)
I: Device update received (short: 0x7b90, long: f4ce361d7112dd0a, status: 1)
I: Device update received (short: 0x7b90, long: f4ce361d7112dd0a, status: 1)
I: Device update received (short: 0x66df, long: f4ce36ec24ebe6a8, status: 1)
I: Device update received (short: 0xb619, long: f4ce36ec24ebe6a8, status: 1)
I: Device update received (short: 0xb619, long: f4ce36ec24ebe6a8, status: 1)
I: Device update received (short: 0x07aa, long: f4ce361054cb3284, status: 1)
I: Device update received (short: 0x07aa, long: f4ce361054cb3284, status: 1)
I: Device authorization event received (short: 0xb83a, long: f4ce363045dbb779, authorization type: 1, authori zation status: 1)
I: Child left the network (long: f4ce363045dbb779, rejoin flag: 0)
I: Device update received (short: 0x7729, long: f4ce36fde1f130fe, status: 1)
I: Device update received (short: 0x7729, long: f4ce36fde1f130fe, status: 1)
I: Device authorization event received (short: 0x7dce, long: f4ce36ddabae0f04, authorization type: 1, authori zation status: 1)
I: Child left the network (long: f4ce36ddabae0f04, rejoin flag: 0)
I: Child left the network (long: f4ce36ddabae0f04, rejoin flag: 0)
I: Device authorization event received (short: 0x7b90, long: f4ce361d7112dd0a, authorization type: 1, authori zation status: 1)
I: Unimplemented signal (signal: 50, status: 0)
I: Device authorization event received (short: 0xb619, long: f4ce36ec24ebe6a8, authorization type: 1, authori zation status: 1)
I: Unimplemented signal (signal: 50, status: 0)
I: Device authorization event received (short: 0x07aa, long: f4ce361054cb3284, authorization type: 1, authori zation status: 1)
I: Unimplemented signal (signal: 50, status: 0)
I: Device update received (short: 0xb83a, long: f4ce363045dbb779, status: 1)
I: Device update received (short: 0xb83a, long: f4ce363045dbb779, status: 1)
I: Device authorization event received (short: 0x7729, long: f4ce36fde1f130fe, authorization type: 1, authori zation status: 1)
I: Device update received (short: 0x67b0, long: f4ce36ddabae0f04, status: 1)
I: Device update received (short: 0x67b0, long: f4ce36ddabae0f04, status: 1)
I: Device authorization event received (short: 0xb83a, long: f4ce363045dbb779, authorization type: 1, authori zation status: 1)
I: Device authorization event received (short: 0x67b0, long: f4ce36ddabae0f04, authorization type: 1, authori zation status: 1)

some devices begin to join, then an assert happens at: 

uart:~$ E: Fatal error occurred
ASSERTION FAIL @ WEST_TOPDIR/nrfxlib/nrf_802154/driver/src/nrf_802154_notification_swi.c:154

Call stack is this:

We a trying to prepare this product for production and this is a major issue at the moment.  Please advise the best way to enable the maximum devices to join the network.

  • Hi,

    I am so sorry for the late reply. I have forwarded your questions about the maxiomum devices that can be added to a network to our Zigbee team but haven't gotten an answer yet. In the old nRF5 SDK for Thread and Zigbee we had conducted tests with up to 24 devices, but with no architectural limitation for networks larger than that. The practical limiting factor will be the Trust Center memory (in the coordinator device) as the TC must hold link keys to all connecting devices.

    Do you have a sniffer trace that can help us debug when the devices are joining your coordinator network? I will like to see what is happening on air. In the log output it looks like some devices are being added multiple times.

    Best regards,

    Marjeris

  • Hi Marjeris,

    To be clear, this seems to be immediately evident if you try to join all devices at one time. If we stagger the endpoints that come in, then the problem is rarely, if ever occurring.

    Best regards,

    TKR

  • Hi TKR,

    We have two theories:

    From the problem description it sounds very likely to be a problem caused by packet collissions due to high traffic, but there should be ways to handle this according to our Zigbee team.

    It could also be an issue with lack of RX buffers. You may try to increase the number of buffer via Kconfig first and see if that helps:

    # Increase the number of RX buffers
    CONFIG_NRF_802154_RX_BUFFERS=32
    Try this and let me know how it goes. I will update you when I have more information as we continue to investigate this internally.
    Best regards,
    Marjeris
  • Hello Marjeris,

    By increasing the buffers to 64, it did become a lot more stable so that was a good suggestion.  We are getting boss zdo asserts when we ask all devices to leave, though. If we cannot find a work around I'll post another ticket with details.

    TKR 

  • FYI, here are the assertions we've seen after increasing the RX buffers:

    zdo assert(122,1892)

    zdo assert(112,1892)  <-- different file, 112 vs 122
    zdo assert(124,589)
    zdo assert(124,635)

    Assertions seem to occur during high traffic bursts.

Related