zigbee: similar issue as KRKNWK-12017

Hi,

SDK: nRF5 SDK for Thread and Zigbee v4.2.0

Chip: nRF52840

We're experiencing a very rare behavior on an end device implementation.

We haven't able to reproduce with logs due to the rare occurrence of the issue but we have some live instrumentation that allows us to get the history of ZBOSS signals.

When the issue happens we get a ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE and the device doesn't try to rejoin the coordinator anymore.

It looks very similar to the KRKNWK-12017 issue bu there is no broken rejoin procedure happening before the ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE event.

Here is the historic of ZBOSS signals for a device which was up for several days and then stops interaction with the coordinator:

  1. ZB_ZDO_SIGNAL_PRODUCTION_CONFIG_READY
  2. ZB_ZDO_SIGNAL_SKIP_STARTUP
  3. ZB_BDB_SIGNAL_DEVICE_REBOOT
  4. ZB_NLME_STATUS_INDICATION  => ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE

Could you recommend a procedure to fix this ? Maybe trigger a rejoin when the ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE occurs ?
FYI, resetting the device makes the device rejoin the network immediately

Thanks,

Sebastien

Parents
  • Hi Sebastien,

    Could you recommend a procedure to fix this ? Maybe trigger a rejoin when the ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE occurs ?

    Yes, since you are able to trace it to a stack signal, the best workaround would be to trigger a rejoin when the signal occurs. You can add a case for ZB_NLME_STATUS_INDICATION in zboss_signal_handler that checks if the status is ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE. To rejoin, you can call bdb_start_top_level_commissioning(ZB_BDB_NETWORK_STEERING).

    If you are able to reproduce the issue and collect sniffer logs, please let me know.

    Best regards,
    Marte

  • Thanks for your input.

    How to be sure we're not running into KRKNWK-12017 in which the ZBOSS stack requires a reset ?

  • Hi,

    Some feedback from the tests.

    We added some counters that we can get via BLE and this what we observed.
    One device ran a few times into the forced rejoined and finally hit the broken rejoin case and rebooted (good)
    One other device ran into the forced rejoin once and that's all. Device not visible on the network anymore.
    It seems to me that even if not in the broken rejoin logic the ZBOSS stack can be KO.

    Can you check with zigbee team if it would be safe to reset the device for any parent link failure ?
    KRKNWK-12017  doesn't seem top be enough.

    FYI, a few weeks ago we managed to reproduce with the serial output and when the parent link failure happened there were no ZB sleeping event at all anymore which makes me think the stack was dead.

    Thx

  • Hi,

    Cheb said:
    Can you check with zigbee team if it would be safe to reset the device for any parent link failure ?
    KRKNWK-12017  doesn't seem top be enough.

    Do you only get ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE once or do you get this signal multiple times when it fails? The best option is to have a failure counter that checks how many times the device has received ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE, and then have a threshold for when it should reset.

    Best regards,
    Marte

  • Do you only get ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE once or do you get this signal multiple times when it fails? The best option is to have a failure counter that checks how many times the device has received ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE, and then have a threshold for when it should reset.

    When it fails ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE happens once and there are no more signals received from the zboss signal handler.

    Regards,

    Sebastien

  • Hi Sebastien,

    In that case, you can reset the device on the first ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE. Please be aware that this will make the device reset in situations where it might have been able to find the parent again or rejoin under a new parent, but it should be able to rejoin successfully on restart.

    Best regards,
    Marte

  • In that case, you can reset the device on the first ZB_NWK_COMMAND_STATUS_PARENT_LINK_FAILURE. Please be aware that this will make the device reset in situations where it might have been able to find the parent again or rejoin under a new parent, but it should be able to rejoin successfully on restart.

    Thanks we have this logic under test.

Reply Children
Related