"zboss_osif: ZBOSS fatal error occurred" when migrating SDK from 2.3.0 to 2.7.0

tl;dr

ZBOSS's version needs to be upgraded one by one. When migrating the SDK from 2.3.0 to 2.7.0 while preserving the persisted ZigBee network info, migrate the code to 2.6.x first and do a firmware upgrade by nrfjprog --sectorerase or an OTA. After that, migrate the code from 2.6.x to 2.7.0 and do a firmware upgrade again. If you go from 2.3.0 to 2.7.0 directly, ZBOSS's migration of the persisted ZigBee network info will fail.

Problem

I've attempted a migration of the Nordic Connect SDK from 2.3.0 to 2.7.0.

The migrated code runs smoothly until zigbee_enable() gets called. After the call, a fatal error from ZBOSS occurs, and the chip resets.

[00:00:01.124,511] <err> zboss_osif: ZBOSS fatal error occurred

The following is the call stack towards zb_osif_abort() where outputs the fatal message. It seems that the fault occurred in the migration of the dataset stored in the NVRAM.

#0  zb_osif_abort () at /opt/nordic/ncs/v2.7.0/nrf/subsys/zigbee/osif/zb_nrf_platform.c:585
#1  0x00046d0a in zb_nvram_write_data ()
#2  0x00063ef4 in zb_nvram_write_zcl_reporting_dataset (page=<optimized out>, pos=148) at /opt/nordic/ncs/v2.7.0/nrfxlib/zboss/production/src/zcl/zcl_nvram.c:509
#3  0x00047642 in write_dataset_body_by_type ()
#4  0x00047902 in migrate_curr_page_to_new_version ()
#5  0x00047d26 in zb_nvram_load ()
#6  0x0004a6ea in zb_zdo_dev_init ()
#7  0x0004ae0a in zb_zdo_start_no_autostart ()
#8  0x0001e20e in zboss_thread (arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>) at /opt/nordic/ncs/v2.7.0/nrf/subsys/zigbee/osif/zb_nrf_platform.c:349
#9  0x0005f17a in z_thread_entry (entry=0x1e209 <zboss_thread>, p1=<optimized out>, p2=<optimized out>, p3=<optimized out>) at /opt/nordic/ncs/v2.7.0/zephyr/lib/os/thread_entry.c:48
#10 0x0005f17a in z_thread_entry (entry=0x1e209 <zboss_thread>, p1=<optimized out>, p2=<optimized out>, p3=<optimized out>) at /opt/nordic/ncs/v2.7.0/zephyr/lib/os/thread_entry.c:48
#11 0x00000000 in ?? ()

Solution

So, I made a branch to migrate the code from 2.3.0 to 2.6.1, which has the next ZBOSS since 2.3.0. I got a successful boot (successful ZBOSS migration) with this. The reproduction steps are the following:

  1. Full-erase nRF52840 and write the firmware with the SDK 2.3.0
  2. Let it join an existing ZigBee PAN and check if the communication is successful. The network persists in the flash.
  3. Sector-erase nRF52840 and write the firmware with the SDK 2.6.1
  4. It boots up and starts working without the fatal error

Then, I did the same from 2.6.1 to 2.7.0. It also succeeded. It turns out that I have to migrate the SDK without skipping ZBOSS's version. I hope this helps someone who suffered from this.

Related