[Zigbee] ZBOSS Fatal Error at init

Setup:

nrf52840DK (Zigbee Cordinator based on Zigbee Coordinator sample)

nrf52840DK (ZR)

NCS v2.4.1

Hi, 

I have encountered dangerous problem during development.

Device constantly reboots after calling zigbee_enable(). I was able to get the backtrace when that happens.

 Backtrace:
#0  zb_osif_abort () at ncs_v2.4.1/nrf/subsys/zigbee/osif/zb_nrf_platform.c:578
#1  0x000a6624 in zb_error_raise ()
#2  0x00096524 in zb_nvram_load ()
#3  0x00099602 in zb_zdo_dev_init ()
#4  0x00099c12 in zb_zdo_start_no_autostart ()
#5  0x0006bd42 in zboss_thread at ncs_v2.4.1/nrf/subsys/zigbee/osif/zb_nrf_platform.c:347


It seems that there was a problem with reading the data from zboss_nvram partition.

To reproduce that you should turn off the DK during joining process exactly when device gets its credentials/reporting configuration etc.
The goal is to interrupt device by turning it off when it processes data which must be stored in the device zboss_nvram flash partition.
It seems like ZBOSS does not have any recovery mechanism or implemented one is not sufficient to deal with invalid data in zboss_nvram partition.

On my end I have implemented a ZBOSS safe startup mechanism which simply checks if ZBOSS initialize successfully, otherwise zboss_nvram is erased recovering the device. Maybe do you have any ideas how can I protect device from such scenario ? like cutting of the power/rebooting during processing critical data.

If it matters my device uses POF(Power-on failure) feature.

Thanks in advance,

Pawel

Parents
  • Hi, 

    Could be that NVRAM gets corrupted since you turn off the device exactly when it is writing important data to NVRAM.

    At the same time, I feel like it's expected that the device might fail if you turn it off while it is joining a network. In such a case I think it should be possible to expect that the end user just factory resets the device as a way to recover it. But I agree that this should be something that is implemented in the stack by default and not something our customers should have to manually implement a fix for. I would create an internal Jira for this. 

    Regards,
    Amanda H.

Reply
  • Hi, 

    Could be that NVRAM gets corrupted since you turn off the device exactly when it is writing important data to NVRAM.

    At the same time, I feel like it's expected that the device might fail if you turn it off while it is joining a network. In such a case I think it should be possible to expect that the end user just factory resets the device as a way to recover it. But I agree that this should be something that is implemented in the stack by default and not something our customers should have to manually implement a fix for. I would create an internal Jira for this. 

    Regards,
    Amanda H.

Children
Related