Zboss "crash" in zb_schedule_alarm()

Hello,

I'm using the zboss stack to implement a zigbee end device on nrf5340.

My device is able to connect to a zigbee network and to notify sensor events on zigbee but after a random time a crash happens on the stack side :

I'm quite stuck to debug this issue.

it seems that the stack stops because of an error executing zb_schedule_alarm() but I have no more clues.

How do you debug such issue ?

Environment :

* nRF5340

* NRF_SDK_VERSION=v1.0.1
  NRF_SDK_NAME=ncs-zigbee-r22
  NRF_MAIN_SDK_VERSION=v2.9.0

Regards,

Gaël

Parents
  • Hi Gaël

    The team has decoded your provided traces and they show the exact line in zb_schedule_alarm where the assert occurs. The reason is that the alarm queue is filled up, reaching ZB_SCHEDULER_Q_SIZE.

    A reason might be that zb_buf_get_out_delayed_ext from the provided code is being called inside interrupt context. zb_buf_get_out_delayed_ext calls zb_schedule_alarm internally and race conditions might occur.

    A safer approach would be to use ZB_SCHEDULE_APP_CALLBACK to schedule the test_send_event_handler actions using ZBOSS context.

    Regards,
    Amanda H.

  • Hello Amanda,

    thanks for your feedback.

    1. The trace I sent you was from my firmware (not the sample app) and in my firmware I already use ZB_SCHEDULE_APP_CALLBACK() to send an event on zigbee

    2. I did the change on the sample app...but the issue still happen. I get back to you with a trace log on the sample.

    static void send_step_cmd_cb(zb_bufid_t cmd_id)
    {
    	zb_ret_t zb_err_code;
    
    	/* Allocate output buffer and send step command. */
    	zb_err_code = zb_buf_get_out_delayed_ext(light_switch_send_on_off,
    							cmd_id,
    							0);
    	if (!zb_err_code) {
    		LOG_WRN("Buffer is full");
    	}
    }
    
    static void test_send_event_handler(struct k_timer *timer)
    {
            zb_uint16_t cmd_id;
    
    	/* toogle button state */
    	if (buttons_ctx.state == BUTTON_ON) {
    		buttons_ctx.state = BUTTON_OFF;
    		cmd_id = ZB_ZCL_CMD_ON_OFF_OFF_ID;
    	} else {
    		buttons_ctx.state = BUTTON_ON;
    		cmd_id = ZB_ZCL_CMD_ON_OFF_ON_ID;
    	}
    
    	zb_ret_t ret = ZB_SCHEDULE_APP_CALLBACK(send_step_cmd_cb, cmd_id);
    	if (ret != 0) {
    		LOG_WRN("Err scheduling send_step_cmd_cb");
    	}
    }

    3. In my test the send period is 5s and I get the warning "Buffer is full" (~200 times in ~20 minutes)

    3. I noticed zb_buf_get_out_delayed_ext was calling zb_schedule_alarm, has ZB_SCHEDULE_APP_CALLBACK does. And looking at the example (If I read correctly), the original call to zb_buf_get_out_delayed_ext() is done in the context of the system workqueue context, so it looks like the same context as using timer i.e. not the zboss thread).

    4. I understand an alarm queue is filled when the issue happens, can you share when does the alarm queue is emptied ? And what might prevent the queue to be emptied ?

  • Hello ,

    in the meantime, have you been able to decode the address of the function which floods the queue ?

  • Hi, 

    • The latest shared traces didn't reproduce the issue. New traces (provided on Jan 9, 2026) don’t show the crash.
    • zboss_trace_20260108_155224.bin from Jan 8, 2026 show the crash and the scheduler queue full with mostly function address 0003f661.

    • ELF from Jan 9, 2026 show that culprit function is zb_nwk_ed_send_timeout_req, but we would need to double check on a combination of build files + traces that show the crash. Function pointers have bit 0 set to 1 to indicate ARM Thumb Mode.
      arm-none-eabi-nm build_dk/light_switch/zephyr/zephyr.elf | grep "0003f660"
      0003f660 T zb_nwk_ed_send_timeout_req
      Detailed analysis from the traces back up this theory.

    • For now, there are no plans to provide R22 add-on updates with new libraries. Recommendation would be to move to the R23 add-on and use zboss_use_r22_behavior() if needed. May I know why you are using the R22 add-on instead of R23?

    • A workaround on the customer side could be to increase the value of ZB_SCHEDULER_Q_SIZE from 24 to 32 or even 48.

    -Amanda H.

  • Hello Amanda,

    thanks for the explanations.

    * I will re-send traces (later). I'm surprised you don't see the issue. I'm quite sure I had a crash on the device (log message) when I recorded the log. Maybe there is another issue.

    * I don't have access to the JIRA ticket of DSR, an account is required. Can you share the content ?

    * I chose R22 because R23 wasn't supported for nRF5340 when I moved to the last version of stack. I just discovered nRF5340 is now supported. I might move to the R23 version.

  • Sorry for the mistake. The JIRA is for internal use. Please move to the R23 version.

  • Hello Amanda,

    I tested the example app with R23 during 2h without having a crash. So I will not re-send you traces on R22.

    I started to move my app to R23 but I have build issue. I get back to you when I will be able to re-test my app.

    Gaël

Reply Children
No Data
Related