This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Coordinator Assert during commission of 24 devices

nRF Connect V 1.6.0

using samples/zigbee/network_coordinator against a nRF52840, I am trying to commission a larger number of devices to join the network. Currently 24 devices, but this happens at lower numbers of 11 devices as well.

enabled the CLI - and performed the "bdb start" to begin commissioning:

uart:~$ I: Network formed successfully, start network steering (Extended PAN ID: f4ce36628124d7a9, PAN ID: 0x 4d03)
I: Device update received (short: 0xb83a, long: f4ce363045dbb779, status: 1)
I: Device update received (short: 0x7dce, long: f4ce36ddabae0f04, status: 1)
I: Device update received (short: 0x7b90, long: f4ce361d7112dd0a, status: 1)
I: Device update received (short: 0x7b90, long: f4ce361d7112dd0a, status: 1)
I: Device update received (short: 0x66df, long: f4ce36ec24ebe6a8, status: 1)
I: Device update received (short: 0xb619, long: f4ce36ec24ebe6a8, status: 1)
I: Device update received (short: 0xb619, long: f4ce36ec24ebe6a8, status: 1)
I: Device update received (short: 0x07aa, long: f4ce361054cb3284, status: 1)
I: Device update received (short: 0x07aa, long: f4ce361054cb3284, status: 1)
I: Device authorization event received (short: 0xb83a, long: f4ce363045dbb779, authorization type: 1, authori zation status: 1)
I: Child left the network (long: f4ce363045dbb779, rejoin flag: 0)
I: Device update received (short: 0x7729, long: f4ce36fde1f130fe, status: 1)
I: Device update received (short: 0x7729, long: f4ce36fde1f130fe, status: 1)
I: Device authorization event received (short: 0x7dce, long: f4ce36ddabae0f04, authorization type: 1, authori zation status: 1)
I: Child left the network (long: f4ce36ddabae0f04, rejoin flag: 0)
I: Child left the network (long: f4ce36ddabae0f04, rejoin flag: 0)
I: Device authorization event received (short: 0x7b90, long: f4ce361d7112dd0a, authorization type: 1, authori zation status: 1)
I: Unimplemented signal (signal: 50, status: 0)
I: Device authorization event received (short: 0xb619, long: f4ce36ec24ebe6a8, authorization type: 1, authori zation status: 1)
I: Unimplemented signal (signal: 50, status: 0)
I: Device authorization event received (short: 0x07aa, long: f4ce361054cb3284, authorization type: 1, authori zation status: 1)
I: Unimplemented signal (signal: 50, status: 0)
I: Device update received (short: 0xb83a, long: f4ce363045dbb779, status: 1)
I: Device update received (short: 0xb83a, long: f4ce363045dbb779, status: 1)
I: Device authorization event received (short: 0x7729, long: f4ce36fde1f130fe, authorization type: 1, authori zation status: 1)
I: Device update received (short: 0x67b0, long: f4ce36ddabae0f04, status: 1)
I: Device update received (short: 0x67b0, long: f4ce36ddabae0f04, status: 1)
I: Device authorization event received (short: 0xb83a, long: f4ce363045dbb779, authorization type: 1, authori zation status: 1)
I: Device authorization event received (short: 0x67b0, long: f4ce36ddabae0f04, authorization type: 1, authori zation status: 1)

some devices begin to join, then an assert happens at: 

uart:~$ E: Fatal error occurred
ASSERTION FAIL @ WEST_TOPDIR/nrfxlib/nrf_802154/driver/src/nrf_802154_notification_swi.c:154

Call stack is this:

We a trying to prepare this product for production and this is a major issue at the moment.  Please advise the best way to enable the maximum devices to join the network.

Parents
  • Hi,

    I am so sorry for the late reply. I have forwarded your questions about the maxiomum devices that can be added to a network to our Zigbee team but haven't gotten an answer yet. In the old nRF5 SDK for Thread and Zigbee we had conducted tests with up to 24 devices, but with no architectural limitation for networks larger than that. The practical limiting factor will be the Trust Center memory (in the coordinator device) as the TC must hold link keys to all connecting devices.

    Do you have a sniffer trace that can help us debug when the devices are joining your coordinator network? I will like to see what is happening on air. In the log output it looks like some devices are being added multiple times.

    Best regards,

    Marjeris

  • Hi Marjeris,

    To be clear, this seems to be immediately evident if you try to join all devices at one time. If we stagger the endpoints that come in, then the problem is rarely, if ever occurring.

    Best regards,

    TKR

  • Yes, these assertions are in the coordinator.

    We restrict our code to a maximum of 3 pending transactions at any one time.  We originally added this limit because we discovered that zboss was running out of buffers (zb_buf_memory_low() returning true) during commissioning of multiple devices.  Enforcing a limit of 3 pending transactions helped reduce the number of zboss assertions.  We eventually figured out how to increase the number of zboss buffers using a custom zb_mem_config header file setting ZB_CONFIG_IOBUF_POOL_SIZE to 127U, and since then we haven't seen the low memory condition.  We continue to see the assertions during commissioning and leave operations.

    We use zdo_mgt_leave_req() to issue the leave request, and consider the transaction pending until the callback.  As noted above, we allow a maximum of 3 pending transactions of any type, including leave.

    I am curious about the limit on the number of ZDO transactions.  What it the source of this limit?  Is it configurable?

    And thanks for your help!  

    Update:  I tried reducing the number of pending transactions to 1 at a time and still got the zdo assert(112,1892).  As we ask devices to leave I see that we're getting a large number of ZB_ZDO_SIGNAL_DEVICE_UPDATE signals (status=device left).  I also see that we get multiple ZB_ZDO_SIGNAL_DEVICE_UPDATE signals for each device as it leaves, maybe one per neighbor?  (Each node in our network is configured as a router+endpoint.)  Perhaps all this traffic is what's triggering the assertion? 

    Here's an extract from our log showing the multiple device_update signals after a leave:

    {cmd:leave,addr:0x494E,device_addr:0x0000000000000000,remove_children:0,rejoin:0} <-- This calls zdo_mgmt_leave_req(...)
    {rsp:"leave",tsn:203}
    {nfy:"throttle",can_send:0,pending:1,max:1,mem_low:0}
    {nfy:"leave",tsn:203,status:0x0}   <-- sent from the zdo_mgmt_leave_req(...) callback
    {nfy:"throttle",can_send:1,pending:0,max:1,mem_low:0}
    {cmd:leave,addr:0x23A2,device_addr:0x0000000000000000,remove_children:0,rejoin:0} <--another call to zdo_mgmt_leave_req(...)
    {rsp:"leave",tsn:204}
    {nfy:"throttle",can_send:0,pending:1,max:1,mem_low:0}
    {nfy:"device_leave",addr:0xf4ce3683c5e670ce,rejoin:0}   <-- ZB_ZDO_SIGNAL_LEAVE for 0x494E above
    {nfy:"device_update",addr:0x494e,status:2}   <-- ZB_ZDO_SIGNAL_DEVICE_UPDATE, status = device left
    {nfy:"device_update",addr:0x494e,status:2}   <-- multiple signals received, possibly from multiple routes???
    {nfy:"device_update",addr:0x494e,status:2}
    {nfy:"device_update",addr:0x494e,status:2}
    {nfy:"device_update",addr:0x494e,status:2}
    {nfy:"device_update",addr:0x494e,status:2}
    {nfy:"leave",tsn:204,status:0x0}

  • Hi Larry,

    I have passed this information to our Zigbee team to get some help at decoding the error messages, I am sorry for the delays but I will am waiting for their feedback about this and on how to change the number of ZDO transactions limit.

  • Hi Larry,

    The limit is set at the ZBOSS compile time (CONFIG_ZB_ZDO_TRAN_TABLE_SIZE) and cannot be changed by the application. We don't have the possibility to recompile the ZBOSS stack we delivered, since it will then not be a certified stack for instance. So I am afraid this is not a configurable limit.

    Our developers are asking if you could log all values returned by the zb_osif_nvram_write? It is possible that the assert(122,1892) is caused by the nvram write error at the OSIF layer.

    Best regards,

    Marjeris

  • We have not seen any zb_osif_nvram_write errors, either before the assertions or at any other time.

  • FYI, we just tried NCS 1.7.0 with zboss 3.8 and see similar asserts during commissioning of several nodes at once.

Reply Children
  • Hi Larry,

    Could you try three configurations below and provide feedback whether or not any fail is observed during commissioning? (For coordinator device)

    1.
        CONFIG_NRF_802154_RX_BUFFERS set to 64
        ZB_CONFIG_IOBUF_POOL_SIZE defined to 127U
        ZB_CONFIG_SCHEDULER_Q_SIZE defined to 127U

    2.

        CONFIG_NRF_802154_RX_BUFFERS set to 32
        ZB_CONFIG_IOBUF_POOL_SIZE defined to 127U
        ZB_CONFIG_SCHEDULER_Q_SIZE defined to 127U

    3.

        CONFIG_NRF_802154_RX_BUFFERS set to 16
        ZB_CONFIG_IOBUF_POOL_SIZE defined to 127U
        ZB_CONFIG_SCHEDULER_Q_SIZE defined to 127U

  • Changing ZB_CONFIG_SCHEDULER_Q_SIZE from 48 to 127 changed the behavior of the problem.  I've been testing with configuration 1 above and now when trying to "gang discover" 20+ devices it now seems that zboss runs out of buffers and never recovers (zb_buf_memory_low() returns 1 until we reset).  Initial testing with configurations 2 and 3 suggests we may just be trading off the likelihood of a zboss assertion with this unrecoverable low memory condition.

Related