[Zigbee] ZBOSS Fatal Error at init

Pawel(embeddedsolutions.pl) over 1 year ago

Setup:

nrf52840DK (Zigbee Cordinator based on Zigbee Coordinator sample)

nrf52840DK (ZR)

NCS v2.4.1

Hi,

I have encountered dangerous problem during development.

Device constantly reboots after calling zigbee_enable(). I was able to get the backtrace when that happens.

 Backtrace:
#0  zb_osif_abort () at ncs_v2.4.1/nrf/subsys/zigbee/osif/zb_nrf_platform.c:578
#1  0x000a6624 in zb_error_raise ()
#2  0x00096524 in zb_nvram_load ()
#3  0x00099602 in zb_zdo_dev_init ()
#4  0x00099c12 in zb_zdo_start_no_autostart ()
#5  0x0006bd42 in zboss_thread at ncs_v2.4.1/nrf/subsys/zigbee/osif/zb_nrf_platform.c:347

It seems that there was a problem with reading the data from zboss_nvram partition.

To reproduce that you should turn off the DK during joining process exactly when device gets its credentials/reporting configuration etc.
The goal is to interrupt device by turning it off when it processes data which must be stored in the device zboss_nvram flash partition.
It seems like ZBOSS does not have any recovery mechanism or implemented one is not sufficient to deal with invalid data in zboss_nvram partition.

On my end I have implemented a ZBOSS safe startup mechanism which simply checks if ZBOSS initialize successfully, otherwise zboss_nvram is erased recovering the device. Maybe do you have any ideas how can I protect device from such scenario ? like cutting of the power/rebooting during processing critical data.

If it matters my device uses POF(Power-on failure) feature.

Thanks in advance,

Pawel

Parents

0 Amanda Hsieh over 1 year ago

Hi,

Could be that NVRAM gets corrupted since you turn off the device exactly when it is writing important data to NVRAM.

At the same time, I feel like it's expected that the device might fail if you turn it off while it is joining a network. In such a case I think it should be possible to expect that the end user just factory resets the device as a way to recover it. But I agree that this should be something that is implemented in the stack by default and not something our customers should have to manually implement a fix for. I would create an internal Jira for this.

Regards,
Amanda H.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Pawel(embeddedsolutions.pl) over 1 year ago in reply to Amanda Hsieh

Hi,

Thank you for fast response.

End user may not be able to do the factory reset of any of the devices with this "inconvienence" unless customer has jlink programmer to erase the zboss_nvram partition manually.

Assert is generated inside the ZBOSS stack which is a black box so developers are not even able to check some return code and recover the device in such scenario.

I strongly agree with you that recovery mechanism should be implemented in the stack. Thank you for creating Jira ticket for this :)

Regards,

Pawel
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Amanda Hsieh over 1 year ago in reply to Pawel(embeddedsolutions.pl)

It's fixed by NCS v2.7.0 which is released now.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel