This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Coordinator Assert during commission of 24 devices

nRF Connect V 1.6.0

using samples/zigbee/network_coordinator against a nRF52840, I am trying to commission a larger number of devices to join the network. Currently 24 devices, but this happens at lower numbers of 11 devices as well.

enabled the CLI - and performed the "bdb start" to begin commissioning:

uart:~$ I: Network formed successfully, start network steering (Extended PAN ID: f4ce36628124d7a9, PAN ID: 0x 4d03)
I: Device update received (short: 0xb83a, long: f4ce363045dbb779, status: 1)
I: Device update received (short: 0x7dce, long: f4ce36ddabae0f04, status: 1)
I: Device update received (short: 0x7b90, long: f4ce361d7112dd0a, status: 1)
I: Device update received (short: 0x7b90, long: f4ce361d7112dd0a, status: 1)
I: Device update received (short: 0x66df, long: f4ce36ec24ebe6a8, status: 1)
I: Device update received (short: 0xb619, long: f4ce36ec24ebe6a8, status: 1)
I: Device update received (short: 0xb619, long: f4ce36ec24ebe6a8, status: 1)
I: Device update received (short: 0x07aa, long: f4ce361054cb3284, status: 1)
I: Device update received (short: 0x07aa, long: f4ce361054cb3284, status: 1)
I: Device authorization event received (short: 0xb83a, long: f4ce363045dbb779, authorization type: 1, authori zation status: 1)
I: Child left the network (long: f4ce363045dbb779, rejoin flag: 0)
I: Device update received (short: 0x7729, long: f4ce36fde1f130fe, status: 1)
I: Device update received (short: 0x7729, long: f4ce36fde1f130fe, status: 1)
I: Device authorization event received (short: 0x7dce, long: f4ce36ddabae0f04, authorization type: 1, authori zation status: 1)
I: Child left the network (long: f4ce36ddabae0f04, rejoin flag: 0)
I: Child left the network (long: f4ce36ddabae0f04, rejoin flag: 0)
I: Device authorization event received (short: 0x7b90, long: f4ce361d7112dd0a, authorization type: 1, authori zation status: 1)
I: Unimplemented signal (signal: 50, status: 0)
I: Device authorization event received (short: 0xb619, long: f4ce36ec24ebe6a8, authorization type: 1, authori zation status: 1)
I: Unimplemented signal (signal: 50, status: 0)
I: Device authorization event received (short: 0x07aa, long: f4ce361054cb3284, authorization type: 1, authori zation status: 1)
I: Unimplemented signal (signal: 50, status: 0)
I: Device update received (short: 0xb83a, long: f4ce363045dbb779, status: 1)
I: Device update received (short: 0xb83a, long: f4ce363045dbb779, status: 1)
I: Device authorization event received (short: 0x7729, long: f4ce36fde1f130fe, authorization type: 1, authori zation status: 1)
I: Device update received (short: 0x67b0, long: f4ce36ddabae0f04, status: 1)
I: Device update received (short: 0x67b0, long: f4ce36ddabae0f04, status: 1)
I: Device authorization event received (short: 0xb83a, long: f4ce363045dbb779, authorization type: 1, authori zation status: 1)
I: Device authorization event received (short: 0x67b0, long: f4ce36ddabae0f04, authorization type: 1, authori zation status: 1)

some devices begin to join, then an assert happens at: 

uart:~$ E: Fatal error occurred
ASSERTION FAIL @ WEST_TOPDIR/nrfxlib/nrf_802154/driver/src/nrf_802154_notification_swi.c:154

Call stack is this:

We a trying to prepare this product for production and this is a major issue at the moment.  Please advise the best way to enable the maximum devices to join the network.

Parents
  • Hi,

    I am so sorry for the late reply. I have forwarded your questions about the maxiomum devices that can be added to a network to our Zigbee team but haven't gotten an answer yet. In the old nRF5 SDK for Thread and Zigbee we had conducted tests with up to 24 devices, but with no architectural limitation for networks larger than that. The practical limiting factor will be the Trust Center memory (in the coordinator device) as the TC must hold link keys to all connecting devices.

    Do you have a sniffer trace that can help us debug when the devices are joining your coordinator network? I will like to see what is happening on air. In the log output it looks like some devices are being added multiple times.

    Best regards,

    Marjeris

  • Hi Marjeris,

    To be clear, this seems to be immediately evident if you try to join all devices at one time. If we stagger the endpoints that come in, then the problem is rarely, if ever occurring.

    Best regards,

    TKR

Reply Children
  • Hi TKR,

    We have two theories:

    From the problem description it sounds very likely to be a problem caused by packet collissions due to high traffic, but there should be ways to handle this according to our Zigbee team.

    It could also be an issue with lack of RX buffers. You may try to increase the number of buffer via Kconfig first and see if that helps:

    # Increase the number of RX buffers
    CONFIG_NRF_802154_RX_BUFFERS=32
    Try this and let me know how it goes. I will update you when I have more information as we continue to investigate this internally.
    Best regards,
    Marjeris
  • Hello Marjeris,

    By increasing the buffers to 64, it did become a lot more stable so that was a good suggestion.  We are getting boss zdo asserts when we ask all devices to leave, though. If we cannot find a work around I'll post another ticket with details.

    TKR 

  • FYI, here are the assertions we've seen after increasing the RX buffers:

    zdo assert(122,1892)

    zdo assert(112,1892)  <-- different file, 112 vs 122
    zdo assert(124,589)
    zdo assert(124,635)

    Assertions seem to occur during high traffic bursts.

  • Hi,

    Are you trying to send 24 ZDO leave frames at once? There is a hard limit on the maximum number of concurrent ZDO transactions (currently set to 16).

    The assert is happening on the coordinator side right?

  • Yes, these assertions are in the coordinator.

    We restrict our code to a maximum of 3 pending transactions at any one time.  We originally added this limit because we discovered that zboss was running out of buffers (zb_buf_memory_low() returning true) during commissioning of multiple devices.  Enforcing a limit of 3 pending transactions helped reduce the number of zboss assertions.  We eventually figured out how to increase the number of zboss buffers using a custom zb_mem_config header file setting ZB_CONFIG_IOBUF_POOL_SIZE to 127U, and since then we haven't seen the low memory condition.  We continue to see the assertions during commissioning and leave operations.

    We use zdo_mgt_leave_req() to issue the leave request, and consider the transaction pending until the callback.  As noted above, we allow a maximum of 3 pending transactions of any type, including leave.

    I am curious about the limit on the number of ZDO transactions.  What it the source of this limit?  Is it configurable?

    And thanks for your help!  

    Update:  I tried reducing the number of pending transactions to 1 at a time and still got the zdo assert(112,1892).  As we ask devices to leave I see that we're getting a large number of ZB_ZDO_SIGNAL_DEVICE_UPDATE signals (status=device left).  I also see that we get multiple ZB_ZDO_SIGNAL_DEVICE_UPDATE signals for each device as it leaves, maybe one per neighbor?  (Each node in our network is configured as a router+endpoint.)  Perhaps all this traffic is what's triggering the assertion? 

    Here's an extract from our log showing the multiple device_update signals after a leave:

    {cmd:leave,addr:0x494E,device_addr:0x0000000000000000,remove_children:0,rejoin:0} <-- This calls zdo_mgmt_leave_req(...)
    {rsp:"leave",tsn:203}
    {nfy:"throttle",can_send:0,pending:1,max:1,mem_low:0}
    {nfy:"leave",tsn:203,status:0x0}   <-- sent from the zdo_mgmt_leave_req(...) callback
    {nfy:"throttle",can_send:1,pending:0,max:1,mem_low:0}
    {cmd:leave,addr:0x23A2,device_addr:0x0000000000000000,remove_children:0,rejoin:0} <--another call to zdo_mgmt_leave_req(...)
    {rsp:"leave",tsn:204}
    {nfy:"throttle",can_send:0,pending:1,max:1,mem_low:0}
    {nfy:"device_leave",addr:0xf4ce3683c5e670ce,rejoin:0}   <-- ZB_ZDO_SIGNAL_LEAVE for 0x494E above
    {nfy:"device_update",addr:0x494e,status:2}   <-- ZB_ZDO_SIGNAL_DEVICE_UPDATE, status = device left
    {nfy:"device_update",addr:0x494e,status:2}   <-- multiple signals received, possibly from multiple routes???
    {nfy:"device_update",addr:0x494e,status:2}
    {nfy:"device_update",addr:0x494e,status:2}
    {nfy:"device_update",addr:0x494e,status:2}
    {nfy:"device_update",addr:0x494e,status:2}
    {nfy:"leave",tsn:204,status:0x0}

Related