This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ZBOSS accepts join after leave with no rejoin

We originally saw (and reported) this problem with nRF5 SDK 4.1.0.  We switched to nRF Connect SDK 1.6.0 which did not exhibit this problem.  Now we just tested with nRF Connect SDK 1.7.0 and the problem is back.  

We see this in the zboss release notes and am wondering if they're just going back and forth with an incorrect fix:

  • [ZBS-98] Rejoin after Leave+Rejoin at ZR is broken in r23 - maybe, in r22 also

Commissioning was NOT enabled at the time of the leave request.

The device all successfully left the network. 

In 1.6.0 we would see the devices try to join again but they would be rejected with the device authorization signal showing status 2.

In 1.7.0 we see that they are allowed to rejoin with the device authorization signal showing status 0, even though commissioning was not enabled.

Parents
  • Hmm, I see that another router is still reporting it's "permit join" flag as set even 15 minutes after commissioning ended.  I wonder if that's how the devices are rejoining after leaving.  

  • Hi,

    I do not see any cases related to the previous bug in your case history. If it was reported in a support ticket here on DevZone, do you know the case number or have a link to the case? This will make it easier for me to look this up internally. I have tried to look for relevant bugs, but the closest I could find so far is bugfix ZOI-60 in ZBOSS 07/14/2020 (ZBOSS 07/14/2020 release notes), but I do not know if this is what you are referring to:

    • [ZOI-60] Network permits new devices after device leave

    I will ask our developers internally to clarify regarding ZBS-98.

    Larry Martin said:
    I see that another router is still reporting it's "permit join" flag as set even 15 minutes after commissioning ended

    This could explain why they are able to join. However, 15 minutes is a long time for the permit join flag to still be activated. Are nodes joining the network during this time? A joining node will broadcast a Mgmt_Permit_Joining_req command with a PermitDuration field set to the minimum commissioning time, bdbcMinCommissioningTime. This will cause nodes that receive this command to reset their timer, such that the permit joining flag period will be extended. If not, have you made any changes to bdbcMinCommissioningTime?

    Best regards,

    Marte

  • You're right, it wasn't reported as a separate ticket.  Rather it was described by a co-worker in the second reply in this thread:  https://devzone.nordicsemi.com/f/nordic-q-a/77020/nrf-connect-sdk-zboss-and-nrf5-sdk-for-thread-and-zigbee-zboss-updates

    That [ZOI-60] notice seems to match what we're seeing, but we still see it in NCS 1.7.0.

    The problem is sporadic and rare.  Most often after a device leaves the network we see subsequent join attempts fail with the device authorization signal showing status 2. In fact, it is possible that the problem existed in NCS 1.6.0 and we just never saw it manifest during our testing.

    I have a log showing the sequence of zboss calls, callbacks, and signals prior to one of these rejoin-after-leave events and will copy a sanitized extract below showing the zboss interactions and signals.

    Note the three ZB_NLME_STATUS_INDICATION signals, two for one of the devices asked to leave and one for node 0xC432 which is another node for which we'd been getting NLME_NO_ROUTE_AVAILABLE reports for the previous 7 minutes.

    zb_mgmt_leave_req
    	dst_addr = 0x4D7F (ieee addr is 0xf4ce36e20cd732cb)
    	device_address = 0x0000000000000000
    	rejoin = 0,
    	remove_children = 0
    returned TSN = 229
    
    ZB_NLME_STATUS_INDICATION
    	network_addr = 0x4d7f
    	status = 0x09  # NLME_PARENT_LINK_FAILURE
    
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce36e20cd732cb
    	rejoin = 0
    *** Note we did not receive a leave_callback for this TSN.
    *** That typically happens before the ZB_ZDO_SIGNAL_LEAVE_INDICATION
    *** We eventually get the callback with status 0x85 at the end of this log/
    
    zb_mgmt_leave_req
    	dst_addr = 0x059C (ieee addr is 0xf4ce36c7c770a342)
    	device_address = 0x0000000000000000
    	rejoin = 0,
    	remove_children = 0
    returned TSN = 231
    
    leave_callback
    	tsn = 231
    	status = 0x00
    	
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce36c7c770a342
    	rejoin = 0
    
    ZB_NLME_STATUS_INDICATION
    	network_addr = 0x4d7f
    	status = 0x02  # NLME_NON_TREE_LINK_FAILURE
    
    zb_mgmt_leave_req
    	dst_addr = 0x648C (ieee addr is 0xF4CE364EF5020CC7)
    	device_address = 0x0000000000000000
    	rejoin = 0,
    	remove_children = 0
    returned TSN = 232
    
    leave_callback
    	tsn = 232
    	status = 0x00
    
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce364ef5020cc7
    	rejoin = 0
    
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce364ef5020cc7
    	rejoin = 0
    
    zb_mgmt_leave_req
    	dst_addr = 0xD7A6 (ieee addr is 0xF4CE36EA86787E57 
    	device_address = 0x0000000000000000
    	rejoin = 0,
    	remove_children = 0
    returned TSN = 233
    
    leave_callback
    	tsn = 233
    	status = 0x00
    
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce36ea86787e57
    	rejoin = 0
    
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce36ea86787e57
    	rejoin = 0
    
    zb_mgmt_leave_req
    	dst_addr = 0x5ABA (ieee addr is 0xF4CE3666E84CD5D1 
    	device_address = 0x0000000000000000
    	rejoin = 0,
    	remove_children = 0
    returned TSN = 234
    
    leave_callback
    	tsn = 234
    	status = 0x00
    
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce3666e84cd5d1
    	rejoin = 0
    
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce3666e84cd5d1
    	rejoin = 0
    
    zb_mgmt_leave_req
    	dst_addr = 0x88B0 (ieee addr is 0xF4CE367D72D53CB8 
    	device_address = 0x0000000000000000
    	rejoin = 0,
    	remove_children = 0
    returned TSN = 235
    
    leave_callback
    	tsn = 235
    	status = 0x00
    
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce367d72d53cb8
    	rejoin = 0
    
    ZB_ZDO_SIGNAL_LEAVE_INDICATION
    	device_addr = 0xf4ce367d72d53cb8
    	rejoin = 0
    
    ZB_NLME_STATUS_INDICATION
    	network_addr = 0xc432
    	status = 0x00  # NLME_NO_ROUTE_AVAILABLE
    
    ZB_ZDO_SIGNAL_DEVICE_UPDATE
    	short_addr = 0xca43
    	status = 0x01
    
    ZB_ZDO_SIGNAL_DEVICE_ANNCE
    	ieee_addr = 0xf4ce36c7c770a342
    	device_short_addr = 0xca43
    	capability = 0x8e
    
    ZB_ZDO_SIGNAL_DEVICE_AUTHORIZED
    	long_addr = 0xf4ce36c7c770a342
    	short_addr = 0xca43
    	authorization_type = 1
    	authorization_status = 0
    
    ZB_ZDO_SIGNAL_DEVICE_UPDATE
    	short_addr = 0x71af
    	status = 0x01
    
    ZB_ZDO_SIGNAL_DEVICE_ANNCE
    	ieee_addr = 0xf4ce364ef5020cc7
    	device_short_addr = 0x71af
    	capability = 0x8e
    
    ZB_ZDO_SIGNAL_DEVICE_AUTHORIZED
    	long_addr = 0xf4ce364ef5020cc7
    	short_addr = 0x71af
    	authorization_type = 1
    	authorization_status = 0
    	
    [other leave/join activity along these lines]
    	
    leave_callback
        tsn = 229
        status = 0x85
        

  • We have not changed the commissioning time.  (In fact, we didn't think it was possible, at least in the nRF5 build.)

  • Hi,

    Thank you for linking to the other ticket!

    Could you please get a sniffer log of this behavior and upload it here as a pcap file?

    I have asked our developers about this internally, and I will let you know when I hear back from them.

    Best regards,

    Marte

  • I think it unlikely we'll be able to capture this in a sniffer log, at least in the near term.  As I mentioned above, this is a rare event.  I've only seen it once in the past two days even though I've been trying to recreate it.  It's much more likely that we get a zboss assertion as described in this ticket:  https://devzone.nordicsemi.com/f/nordic-q-a/79054/coordinator-assert-during-commission-of-24-devices

    With NCS 1.7.0 some of the file,line numbers have changed, but we still see these asserts when commissioning/leaving a large number of nodes too quickly.

  •  join-after-leave-at-time-1090-ish.pcapng

    I continued experimenting and was able to recreate the problem and capture it Wireshark.  The attached pcapng capture has a couple of commission/leave cycles.  The first cycle commissioned all 22 devices, and they all left when commanded.  The second cycle commissioned 19 of 22 devices.  The other 3 were still powered up, but did not successfully join.

    The interesting stuff seems to be happening between times t=808 and t=1090 in the wireshark capture. Here's what I think I see:

    While commissioning we get the last Device Announcement message (and associated zboss signal) at t=810.25.  (We think it likely that the zboss signal was sent at about 809.5 in response to the first announcement for that device, but can't correlate our logs that precisely.)

    We see subsequent Permit Join Request broadcasts from various nodes until about t=915.

    We think commissioning has ended at t=989 because it has been 180 seconds since the last ZB_ZDO_SIGNAL_DEVICE_ANNCE from zboss, at which time we start configuring the devices.

    I waited about 1.5 additional minutes and then started asking nodes to leave.  You can see the first leave request at t=1078.9 and the second at t=1084.3.

    Just after the second leave request I see a device association and key exchange sequence for addr=0xcc20 culminating in what appears to be a successful authorization at t=1090.44.

    Note that the first log sequence I posted yesterday had a delay of 5 minutes between the end of commissioning and the first leave request.

    I'm guessing that the Permit Join Requests following the last Device Announcement message kept commissioning active in the coordinator.  We had seen the recommendation to extend commissioning time in response to ZB_ZDO_SIGNAL_DEVICE_ANNCE and we are doing that, but it seems it might also be extended by the Permit Join Request messages.  If this is correct, how can we detect this condition and correctly track the commissioning state in the coordinator/network?

    Note that we generally do not see this extended commissioning behavior.  Could it be triggered by the 3 devices that had not yet joined the network?  (All our devices are endpoint+routers.)

Reply
  •  join-after-leave-at-time-1090-ish.pcapng

    I continued experimenting and was able to recreate the problem and capture it Wireshark.  The attached pcapng capture has a couple of commission/leave cycles.  The first cycle commissioned all 22 devices, and they all left when commanded.  The second cycle commissioned 19 of 22 devices.  The other 3 were still powered up, but did not successfully join.

    The interesting stuff seems to be happening between times t=808 and t=1090 in the wireshark capture. Here's what I think I see:

    While commissioning we get the last Device Announcement message (and associated zboss signal) at t=810.25.  (We think it likely that the zboss signal was sent at about 809.5 in response to the first announcement for that device, but can't correlate our logs that precisely.)

    We see subsequent Permit Join Request broadcasts from various nodes until about t=915.

    We think commissioning has ended at t=989 because it has been 180 seconds since the last ZB_ZDO_SIGNAL_DEVICE_ANNCE from zboss, at which time we start configuring the devices.

    I waited about 1.5 additional minutes and then started asking nodes to leave.  You can see the first leave request at t=1078.9 and the second at t=1084.3.

    Just after the second leave request I see a device association and key exchange sequence for addr=0xcc20 culminating in what appears to be a successful authorization at t=1090.44.

    Note that the first log sequence I posted yesterday had a delay of 5 minutes between the end of commissioning and the first leave request.

    I'm guessing that the Permit Join Requests following the last Device Announcement message kept commissioning active in the coordinator.  We had seen the recommendation to extend commissioning time in response to ZB_ZDO_SIGNAL_DEVICE_ANNCE and we are doing that, but it seems it might also be extended by the Permit Join Request messages.  If this is correct, how can we detect this condition and correctly track the commissioning state in the coordinator/network?

    Note that we generally do not see this extended commissioning behavior.  Could it be triggered by the 3 devices that had not yet joined the network?  (All our devices are endpoint+routers.)

Children
No Data
Related