Frozen zigbee network coordinator sample on nRF5340

Working on a zephyr/zigbee/802.15.4 system and my code (loosely based on the Nordic samples) on a custom device was not behaving as expected. 

So I broke out an nRF5340 DK, built and flashed the standard nrf/samples/zigbee/network_coordinator... Hmmm, nothing good there either, LED flashes forever, nothing else.

After managing to get some logging enabled, and peppering the codebase with crumbs I finally came to the attached trace (I added the 10 second delay at the start of main, just to have time to get all the logs, last line is where the hang is).

Short version: after calling zigbee_enable() execution eventually winds its way down to a call to zb_trans_transmit(), which lives in zigbee/osif/zb_nrf_transceiver.c, and then it's stuck there for all time.

In that function, it's getting caught on whatever radio_api->tx() is and just hanging indefinitely, specifically here:

switch (wait_type) {
        case ZB_MAC_TX_WAIT_CSMACA: {
                state_cache.radio_state = RADIO_802154_STATE_TRANSMIT;
                enum ieee802154_tx_mode mode;
                LOG_DBG("%s ZB_MAC_TX_WAIT_CSMACA", __func__);
                if (radio_api->get_capabilities(radio_dev)
                    & IEEE802154_HW_CSMA) {
                        mode = IEEE802154_TX_MODE_CSMA_CA;
                } else {
                        mode = IEEE802154_TX_MODE_CCA;
                }

                LOG_DBG("%s got caps mode %i. Calling radio_api->tx()...", __func__, mode);
                err = radio_api->tx(radio_dev, mode, pkt, &frag);
                LOG_DBG("%s tx called, ret %i", __func__, err);
                break;
        }
    /* ... */
    }


I am at a loss as to how to move forward at this point. 
I am building this with a pretty standard call to

west build -p auto \
    -b nrf5340dk_nrf5340_cpuapp \
    nrf/samples/zigbee/network_coordinator
    

and flashing it as per the usual (directly through the DK, or using a j-link for my boards).

I think the zboss stuff is closed source, or at least distributed as some binary blob (ugh), so I'm not sure how to get further toward resolving this. 

Any ideas how to resolve or at least debug further?

# SEGGER J-Link RTT Viewer V7.58d Terminal Log File
# Compiled: 15:28:47 on Dec  2 2021
# Logging started @ 20 Dec 2021 05:37:54
00> [00:00:00.400,146] <inf> ieee802154_nrf5: nRF5 802154 radio initialized
00> [00:00:00.400,482] <dbg> zboss_osif.zb_trans_set_auto_ack: Function: zb_trans_set_auto_ack, enabled: 1
00> [00:00:00.434,020] <dbg> zboss_osif.
00>                                     de ad 0a 02 00 00 00 00  77 00 b3 00             |........ w...    
00> [00:00:00.434,844] <dbg> zboss_osif.
00>                                     de ad 26 02 00 00 01 00  77 00 b7 00 14 0f 00 00 |..&..... w.......
00>                                     c4 00 00 00 10 00 00 00  00 03 00 00 54 02 00 00 |........ ....T...
00>                                     98 01 00 00 64 02 00 00                          |....d...         
00> [00:00:00.436,828] <dbg> zboss_osif.
00>                                     de ad 0e 02 00 00 02 00  77 00 c0 00 02 00 00 00 |........ w.......
00> [00:00:00.437,713] <dbg> zboss_osif.
00>                                     de ad 0e 02 00 00 03 00  77 00 c2 00 c0 02 00 00 |........ w.......
00> [00:00:00.438,59[00:00:10.616,180] <inf> app: Starting ZBOSS Coordinator example
00> [00:00:10.616,699] <dbg> zboss_osif.
00>                                     de ad 0e 02 04 00 0c 00  90 0c 55 01 01 00 00 00 |........ ..U.....
00> [00:00:10.617,584] <inf> zboss_osif: zigbee_enable created zboss thread
00> [00:00:10.617,950] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 0, len: 12
00> [00:00:10.618,408] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 12, len: 4
00> [00:00:10.618,865] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 16, len: 8
00> [00:00:10.619,293] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 1, pos: 0, len: 12
00> [00:00:10.619,750] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 1, pos: 12, len: 4
00> [00:00:10.620,208] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 1, pos: 16, len: 8
00> [00:00:10.620,635] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 0, len: 12
00> [00:00:10.626,190] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 12, len: 4
00> [00:00:10.626,617] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 16, len: 8
00> [00:00:10.627,075] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 24, len: 12
00> [00:00:10.632,598] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 52, len: 8
00> [00:00:10.633,056] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 60, len: 12
00> [00:00:10.633,514] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 88, len: 8
00> [00:00:10.633,941] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 96, len: 12
00> [00:00:10.634,399] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 124, len: 8
00> [00:00:10.634,857] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 132, len: 12
00> [00:00:10.635,314] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 160, len: 8
00> [00:00:10.635,772] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 168, len: 12
00> [00:00:10.636,230] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 196, len: 8
00> [00:00:10.636,688] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 204, len: 12
00> [00:00:10.642,211] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 232, len: 8
00> [00:00:10.642,669] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 240, len: 12
00> [00:00:10.643,127] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 268, len: 8
00> [00:00:10.643,585] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 276, len: 12
00> [00:00:10.644,042] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 304, len: 8
00> [00:00:10.649,566] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 312, len: 12
00> [00:00:10.650,024] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 340, len: 8
00> [00:00:10.650,482] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 348, len: 12
00> [00:00:10.650,939] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 376, len: 8
00> [00:00:10.651,397] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 384, len: 12
00> [00:00:10.651,855] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 412, len: 8
00> [00:00:10.657,379] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 420, len: 12
00> [00:00:10.657,836] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 448, len: 8
00> [00:00:10.658,294] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 456, len: 12
00> [00:00:10.663,848] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 484, len: 8
00> [00:00:10.664,306] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 492, len: 12
00> [00:00:10.664,764] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 1, pos: 0, len: 12
00> [00:00:10.665,191] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 1, pos: 12, len: 4
00> [00:00:10.665,649] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 1, pos: 16, len: 8
00> [00:00:10.666,076] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 1, pos: 24, len: 12
00> [00:00:10.666,564] <dbg> zboss_osif.zb_osif_nvram_read: Function: zb_osif_nvram_read, page: 0, pos: 468, len: 16
00> [00:00:10.667,083] <dbg> zboss_osif.
00>                                     de ad 0a 02 04 00 0d 00  0b 00 8c 00             |........ ....    
00> [00:00:10.667,907] <dbg> zboss_osif.
00>                                     de ad 0a 02 04 00 0e 00  0b 00 20 02             |........ .. .    
00> [00:00:10.668,731] <dbg> zboss_osif.zb_osif_nvram_write: Function: zb_osif_nvram_write, page: 0, pos: 492, len: 12
00> [00:00:10.674,407] <dbg> zboss_osif.zb_osif_nvram_write: Function: zb_osif_nvram_write, page: 0, pos: 504, len: 16
00> [00:00:10.675,079] <dbg> zboss_osif.zb_osif_nvram_write: Function: zb_osif_nvram_write, page: 0, pos: 520, len: 8
00> [00:00:10.675,659] <inf> zboss_osif: z
00> [00:00:10.676,147] <inf> app: sighndl 23
00> [00:00:10.676,391] <inf> app: def
00> [00:00:10.676,605] <inf> zigbee_app_utils: Production configuration is not present or invalid (status: -1)
00> [00:00:10.677,001] <inf> app: NO JOIN
00> [00:00:10.677,703] <inf> app: sighndl 1
00> [00:00:10.677,947] <inf> app: def
00> [00:00:10.678,161] <inf> zigbee_app_utils: Zigbee stack initialized
00> [00:00:10.678,466] <inf> app: NO JOIN
00> [00:00:10.678,955] <dbg> zboss_osif.
00>                                     de ad 0e 02 04 00 0f 00  2b 08 94 03 01 00 00 00 |........ +.......
00> [00:00:10.686,889] <dbg> zboss_osif.zb_trans_set_pan_id: Function: zb_trans_set_pan_id, PAN ID: 0xffff
00> [00:00:[00:00:10.688,415] <dbg> zboss_osif.zb_trans_set_long_addr: Function: zb_trans_set_long_addr, long addr: 0x77
00> 10.687,[00:00:10.690,124] <dbg> zboss_osif.
00>                                     de ad 1e 02 04 00 10 00  3e 00 53 01 00 00 00 00 |........ >.S.....
00>                                     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |........ ........
00> [00:00:10.691,955] <dbg> zboss_osif.
00>                                     de ad 0a 02 05 00 11 00  3e 00 8e 01             |........ >...    
00> [00:00:10.694,244] <dbg> zboss_osif.
00>                                     de ad 12 02 05 00 12 00  3e 00 f3 01 05 00 00 00 |........ >.......
00>                                     00 00 00 00                                      |....             
00> [00:00:10.695,587] <inf> app: sighndl 5
00> [00:00:10.695,831] <inf> app: def
00> [00:00:10.696,044] <inf> zigbee_app_utils: Device started for the first time
00> [00:00:10.696,380] <inf> zigbee_app_utils: Start network formation (top lev commissionning)
00> [00:00:10.696,716] <inf> app: NO JOIN
00> [00:00:10.697,052] <dbg> zboss_o77] <inf> [00:00:10.701,202] <dbg> zboss_osif.zb_trans_set_short_addr: Function: zb_trans_set_short_addr, 0xfffe
00> app: ZBOSS Coordina[00:00:10.701,995] <dbg> zboss_osif.zb_trans_set_long_addr: Function: zb_trans_set_long_addr, long addr: 0x77
00> tor example started00:00:10.703,704] <dbg> zboss_osif.
00>                                     de ad 1e 02 05 00 14 00  3e 00 53 01 00 00 00 00 |........ >.S.....
00>                                     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |........ ........
00> [00:00:10.705,505] <dbg> zboss_osif.
00>                                     de ad 0a 02 05 00 15 00  3e 00 8e 01             |........ >...    
00> [00:00:10.707,427] <dbg> zboss_osif.
00>                                     de ad 0a 02 05 00 16 00  35 00 c1 00             |........ 5...    
00> [00:00:10.708,435] <dbg> zboss_osif.zb_trans_set_pan_id: Function: zb_trans_set_pan_id, PAN ID: 0xffff
00> 0m
00> [00:00:10.709,106] <dbg> zboss_osif.zb_trans_set_tx_power: Function: zb_trans_set_tx_power, power: 20
00> [00:00:10.709,716] <dbg> zboss_osif.zb_trans_get_tx_power: Function: zb_trans_get_tx_power, power: 20
00> [00:00:10.710,113] <dbg> zboss_osif.zb_trans_set_channel: Function: zb_trans_set_channel, channel number: 16
00> [00:00:10.710,754] <dbg> zboss_osif.zb_trans_enter_receive: Function: zb_trans_enter_receive
00> [00:00:10.711,486] <dbg> zboss_osif.zb_trans_transmit: Function: zb_trans_transmit, channel: 16
00> [00:00:10.711,914] <dbg> zboss_osif.zb_trans_transmit: zb_trans_transmit ZB_MAC_TX_WAIT_CSMACA
00> [00:00:10.712,280] <dbg> zboss_osif.zb_trans_transmit: zb_trans_transmit got caps mode 2. Calling radio_api->tx()...

  • Hm, the more I look into it, the more it looks like this leads somewhere into
     nrfxlib/nrf_802154/driver/src/nrf_802154.c

    but I can't help but hope there's something more obvious I'm missing... let me know if any clues present themselves or if any other info would be helpful.

  • Hello,

    Our zigbee expert is out of office today, but can look into your question tomorrow.

    Best regards,
    Kenneth

  • Hi,

    Sorry for the delay.

    From your other case (281576) I see that you are using v1.8.0-rc1 of nRF Connect SDK. This is a release candidate of v1.8.0, so I recommend updating to v1.8.0 since that is out now. 

    What is the version of your DK, and is it the nRF5340 PDK or nRF5340 DK?

    After managing to get some logging enabled, and peppering the codebase with crumbs I finally came to the attached trace (I added the 10 second delay at the start of main, just to have time to get all the logs, last line is where the hang is).

    Does it only hang with the changes you made, or does it also hang with an unedited version of the sample? What changes did you make, beside enabling logging? I tested right now with a clean version of nRF Connect SDK v1.8.0 and nRF5340 DK, and I could see no issues with the network coordinator sample. 

    Your other case seem to be very similar to this one, but from what I could see the issue there is when using your custom board, while in this case you are seeing the issue with a nRF5340 DK as well. Am I understanding this correct?

    Best regards,

    Marte

  • I've updated to the latest sdk-nrf stuff, and flashed the unmodified network_coordinator.  It does nothing at all (no LED activity).

    The behaviour is actually quite different. My code, on my system, gets a stack overflow when the switch tries to join and reboots, but otherwise looks pretty normal.

    With the sample coordinator on the dev kit, I get nothing.  When debugging, or attempting to, I set a breakpoint on main and wind up in an unending loop of

    (gdb) c
    Continuing.
    
    Program received signal SIGTRAP, Trace/breakpoint trap.
    0xeffffffe in ?? ()
    (gdb) c
    Continuing.
    
    Program received signal SIGTRAP, Trace/breakpoint trap.
    0xeffffffe in ?? ()
    (gdb) c
    Continuing.
    
    

    I can flash and break on main/button_pressed when I use the zephyr/samples/basic/button/ sample, but it's really unhappy with the network_coordinator.

    Just checked and it is in fact the nRF5340-PDK... early adopter ;-) Does that make a difference?
    On my prototypes, I'm currently using Fanstel BT40 modules.

  • Hi,

    psychogenic said:
    it is in fact the nRF5340-PDK

    The nRF5340 PDK is only supported up until nRF Connect SDK v1.4.2, and you will not be able to get Zigbee to work on the PDK in later versions unfortunately.

    Best regards,

    Marte

Related