Zigbee Light Switch crashes: zb_nwk_ed_timeout_req_frame_confirm

I'm struggling with the nrf sdk v2.9.2 zigbee light_switch example. I copy the example, add this to proj.conf

CONFIG_ZIGBEE_CHANNEL=25
CONFIG_ZIGBEE_CHANNEL_SELECTION_MODE_SINGLE=y

Compile it up and install it on the nrf52840dk. Connect it to my Zigbee network which is hosted by my Home Assistant, ZHA and using a ZBT-2.

It works, but at some point - between 1min, and 15min - it'll panic.  I've tried many debugging and logging options and as best I can tell it's related to this backtrace: zb_nwk_ed_timeout_req_frame_confirm → zb_schedule_alarm → zb_osif_abort. No amount of logging get me more actionable info and searching and AI queries haven't helped.

If I hook sys_arch_reboot there appears to be stack space left and zb_buf_get_out can still return buffers when it crashes.

Parents Reply Children
  • Hi,

    Are you saying that you only see this issue occurring when you have enabled these configs?

    Getting more logs from the device crashing would be a good idea. Please increase the log level and attach the logs to the case. Maybe also a sniffer trace of the communication, but lets hold off on that for now. Is there anything in particular happening when you see this?

    If you could also try lowering the memory and see if that makes the issue happen sooner. Some info on how to do this is mentioned here in KRKNWK-12115 (I see you'll unfortunately have to first select your SDK version in the drop-down menu there). 

    Regards,

    Elfving

  • To clarify, I've never tried it without that config - as by default it tries to connect to channel 15 and my hub is on 25.  I've no reason to suspect thats related.

    I'll try to get better logs tonight, but TBH I spent ages over the weekend without success.  This is the best I managed, which was with this config (the only way to get maximal logging and not overflow it):

    # stop when it all goes wrong
    CONFIG_RESET_ON_FATAL_ERROR=n
    CONFIG_THREAD_STACK_INFO=y
    CONFIG_INIT_STACKS=y
    CONFIG_THREAD_NAME=y
    CONFIG_ZBOSS_ERROR_PRINT_TO_LOG=y
    CONFIG_THREAD_MONITOR=y
    CONFIG_STACK_CANARIES=y
    CONFIG_STACK_SENTINEL=y
    CONFIG_DEBUG_INFO=y
    CONFIG_THREAD_RUNTIME_STATS=y
    CONFIG_ASSERT_VERBOSE=y

    # RTT Console
    CONFIG_USE_SEGGER_RTT=y
    CONFIG_SEGGER_RTT_BUFFER_SIZE_UP=8192
    CONFIG_RTT_CONSOLE=y
    CONFIG_LOG_BUFFER_SIZE=8192
    CONFIG_LOG_BACKEND_RTT=y
    CONFIG_LOG_BACKEND_RTT_MODE_BLOCK=n

    # CRITICAL: Disable minimal logging mode
    CONFIG_LOG=y
    CONFIG_LOG_MODE_DEFERRED=y
    CONFIG_LOG_MODE_MINIMAL=n
    CONFIG_LOG_DEFAULT_LEVEL=3

    # Disable UART console
    CONFIG_UART_CONSOLE=y

    # ZBOSS Traces
    CONFIG_ZIGBEE_ENABLE_TRACES=y
    CONFIG_ZBOSS_TRACE_BINARY_LOGGING=n
    CONFIG_ZBOSS_TRACE_MASK=0xFFFFFFFF
    CONFIG_ZBOSS_TRACE_LOG_LEVEL_DBG=y

    CONFIG_KERNEL_LOG_LEVEL_INF=y # shut up z_impl_k_mutex_lock
    CONFIG_ZBOSS_OSIF_LOG_LEVEL_INF=y # shut up zboss_osif
    CONFIG_MPU_LOG_LEVEL_INF=y # shut up mpu

    The logs look like this and try as I might to get more info, nothing seems to work

    [00:08:12.572,570] <dbg> mpu: region_init: [2] 0x2000bf80 0x150b000a
    [00:08:12.572,631] <dbg> mpu: mpu_configure_region: Configure MPU region at index 0x2
    [00:08:12.572,662] <dbg> mpu: region_allocate_and_init: Program MPU region at index 0x2
    [00:08:12.572,662] <dbg> mpu: region_init: [2] 0x2000ca80 0x150b000a
    [00:08:12.577,545] <dbg> mpu: mpu_configure_region: Configure MPU region at index 0x2
    [00:08:12.577,545] <dbg> mpu: region_allocate_and_init: Program MPU region at index 0x2
    [00:08:12.577,575] <dbg> mpu: region_init: [2] 0x20003080 0x150b000a
    [00:08:12.577,606] <dbg> ieee802154_nrf5: nrf5_rx_thread: Frame received
    [00:08:12.577,667] <dbg> ieee802154_nrf5: nrf5_rx_thread: Caught a packet (56) (LQI: 164)
    [00:08:12.577,728] <dbg> ieee802154_nrf5: nrf5_rx_thread: Waiting for frame
    [00:08:12.577,758] <dbg> mpu: mpu_configure_region: Configure MPU region at index 0x2
    [00:08:12.577,758] <dbg> mpu: region_allocate_and_init: Program MPU region at index 0x2
    [00:08:12.577,789] <dbg> mpu: region_init: [2] 0x2000bf80 0x150b000a
    [00:08:12.577,819] <dbg> mpu: mpu_configure_region: Configure MPU region at index 0x2
    [00:08:12.577,850] <dbg> mpu: region_allocate_and_init: Program MPU region at index 0x2
    [00:08:12.577,880] <dbg> mpu: region_init: [2] 0x2000ca80 0x150b000a

    [00:08:12.584,503] <err> zboss_osif: ZBOSS fatal error occurred

    [00:08:12.846,954] <dbg> mpu: mpu_configure_region: Configure MPU region at index 0x2
    [00:08:12.846,984] <dbg> mpu: region_allocate_and_init: Program MPU region at index 0x2
    [00:08:12.847,015] <dbg> mpu: region_init: [2] 0x20003080 0x150b000a
    [00:08:12.847,015] <dbg> ieee802154_nrf5: nrf5_rx_thread: Frame received
    [00:08:12.847,076] <dbg> ieee802154_nrf5: nrf5_rx_thread: Caught a packet (50) (LQI: 0)
    [00:08:12.847,137] <dbg> ieee802154_nrf5: nrf5_rx_thread: Waiting for frame
    [00:08:12.847,167] <dbg> mpu: mpu_configure_region: Configure MPU region at index 0x2
    [00:08:12.847,198] <dbg> mpu: region_allocate_and_init: Program MPU region at index 0x2
    [00:08:12.847,198] <dbg> mpu: region_init: [2] 0x2000bf80 0x150b000a
    [00:08:12.847,259] <dbg> mpu: mpu_configure_region: Configure MPU region at index 0x2
    [00:08:12.847,290] <dbg> mpu: region_allocate_and_init: Program MPU region at index 0x2
    [00:08:12.847,290] <dbg> mpu: region_init: [2] 0x2000ca80 0x150b000a
    [00:08:14.094,970] <dbg> ram_pwrdn: power_up_ram: Powering up sections 0-5 of bank 8

    *** SYSTEM REBOOT ***
    Reboot type: 1
    Got test buffer: 7 (pool not empty)

  • I see. I have some idea as to what might happen here. The alarm queue fills up.

    You are not using the R23 add-on SDK right? My first recommendation is to use that. Though if you can not, then my second recommendation is to try increasing ZB_SCHEDULER_Q_SIZE to 32 or even 48.

    Regards,

    Elfving

  • I'm using the v2.9.2 SDK with whatever Zigbee is included (R23 I'm guessing).  For my education, what in the logs lets you know the scheduler Q size is too small?

  • Coops said:
    I'm using the v2.9.2 SDK with whatever Zigbee is included (R23 I'm guessing).

    The R23 add-on SDK is the only way to get R23 Zigbee. 

    Coops said:
    For my education, what in the logs lets you know the scheduler Q size is too small?

    It just looks similar to another case where that was likely to be the issue. I wouldn't say it is obvious.

    Regards,

    Elfving

Related