mpsl assert file "107", line 292

Hi guys,

For a client of mine I'm working on an openthread RCP implementation using a SolidRun N8 that contains the NRF52833.

I'm using ncs v2.9.0-nRF54H20-1.

The problem is that after only ~2 minutes of 5 ping packets per second, I get the following assert (seen here in gdb)

Breakpoint 1, m_assert_handler (file=0x20005ebc <z_interrupt_stacks+636> "107", line=292) at /home/cristic/ncs/v2.9.0-nRF54H20-1/nrf/subsys/mpsl/init/mpsl_init.c:304

The code is pretty basic, based on an older sample. It initializes a single instance and runs an endless while:

    while (!otSysPseudoResetWasRequested())
    {
        otTaskletsProcess(instance);
        otSysProcessDrivers(instance);
    }

Here's the full project .config file.

6886.config.txt

Any hint on what I could change to make this work?

Parents
  • Hi,

    that contains the NRF52833.
    I'm using ncs v2.9.0-nRF54H20-1.

    The release you are using is for nRF54H20 only. It's not a qualified release for nRF52833.

    Please use e.g. NCS v2.9.2 instead, and see if that solves your issue.

  • HI Sigurd,

    Thanks a lot for the reply and pointing out the version problem. That was the latest release when I started working with it.

    I have switched to v2.9.2 and I get the same behavior.

    (gdb) bt
    #0  m_assert_handler (file=0x20005ebc <z_interrupt_stacks+636> "107", line=292) at /home/cristic/ncs/v2.9.2/nrf/subsys/mpsl/init/mpsl_init.c:304
    #1  0x000035e0 in sym_S2UAPMFVIQXDUOA6CV7GJMB33TYHEUH5D6LHO5Q ()

    8358.config.txt

    From what I understand it happens during RTC0 interrupt handling. Is there something in my clock configuration that is wrong?

  • hi Sigurd,

    sadly I don't have DK.

    I'm using the ncs nrf/samples/openthread/coprocessor as a starting point, replace the main.c and update the overlay and prj.conf. The overlay change is needed because the SolidRun has the Fujitsu FWM7BLZ22 containing the NRF52833 connected without RTS/CTS so I have no flow-control.

    I'm building with:

    west build -b nrf52833dk/nrf52833 

    I'm attaching the files:

    /*
     * Copyright (c) 2023 Nordic Semiconductor ASA
     *
     * SPDX-License-Identifier: LicenseRef-Nordic-5-Clause
     */
    #include <zephyr/kernel.h>
    #include <zephyr/logging/log.h>
    #include <openthread-system.h>
    #include <openthread/ncp.h>
    #include <openthread/tasklet.h>
    #include "utils/uart.h"
    
    #if defined(CONFIG_RCP_SAMPLE_HCI)
    #include "rcp_hci.h"
    #endif
    
    LOG_MODULE_REGISTER(coprocessor_sample, CONFIG_OT_COPROCESSOR_LOG_LEVEL);
    
    #define WELCOME_TEXT                                                           \
    	"\n\r"                                                                 \
    	"\n\r"                                                                 \
    	"=========================================================\n\r"        \
    	"OpenThread Coprocessor application is now running on NCS.\n\r"        \
    	"=========================================================\n\r"
    
    // void otPlatUartReceived(const uint8_t *aBuf, uint16_t aBufLength) { otNcpHdlcReceive(aBuf, aBufLength); }
    
    // void otPlatUartSendDone(void) { otNcpHdlcSendDone(); }
    
    static int NcpSend(const uint8_t *aBuf, uint16_t aBufLength)
    {
        otPlatUartSend(aBuf, aBufLength);
        return aBufLength;
    }
    
    void otAppNcpInit(otInstance *aInstance)
    {
        otPlatUartEnable();
    
        otNcpHdlcInit(aInstance, NcpSend);
    }
    
    int main(void)
    {
    	LOG_INF(WELCOME_TEXT);
    
    	otInstance *instance;
    
    pseudo_reset:
    	otSysInit(0, NULL);   // Initializes OpenThread platform
    
    	instance = otInstanceInitSingle();
    
    	otAppNcpInit(instance);
    
    	while (1)
    	{
    		otTaskletsProcess(instance);
    		otSysProcessDrivers(instance);
    	}
    
    	otInstanceFinalize(instance);
    
    #if defined(CONFIG_RCP_SAMPLE_HCI)
    	run_hci();
    #endif
    
    	goto pseudo_reset;
    
    	return 0;
    }
    

    6330.nrf52833dk_nrf52833.overlay

    08732.prj.conf

    I tried it without MPSL yesterday evening, meaning I have added the following to my prj.conf:

    CONFIG_NRF_802154_SL_OPENSOURCE=y
    CONFIG_MPSL=n
    CONFIG_NET_PKT_TXTIME=n
    CONFIG_IEEE802154_CSL_ENDPOINT=n

    The result is still a crash, this time in an svc instruction. Also after ~1 minute.

    (gdb) bt
    #0  arch_system_halt (reason=4) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:30
    #1  0x00011dd8 in k_sys_fatal_error_handler (reason=<optimized out>, esf=<optimized out>) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:44
    #2  0x0000ab8c in z_fatal_error (reason=<optimized out>, esf=<optimized out>) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:119
    #3  0x00001f5e in _oops () at /home/cristic/ncs/v2.9.2/zephyr/arch/arm/core/cortex_m/swap_helper.S:318
    (gdb) info r
    r0             0x4                 4
    r1             0x20004c48          536890440
    r2             0x1                 1
    r3             0x20                32
    r4             0x20000dc8          536874440
    r5             0x0                 0
    r6             0x0                 0
    r7             0x20002c90          536882320
    r8             0x0                 0
    r9             0x0                 0
    r10            0x2                 2
    r11            0x0                 0
    r12            0x6151              24913
    sp             0x20004c28          0x20004c28 <z_interrupt_stacks+552>
    lr             0x11dd9             73177
    pc             0x11dd0             0x11dd0 <arch_system_halt+14>
    xpsr           0x2100000b          553648139
    fpscr          0x0                 0
    msp            0x20004c28          0x20004c28 <z_interrupt_stacks+552>
    psp            0x20004530          0x20004530 <ot_stack_area+1840>
    primask        0x0                 0
    basepri        0x20                32
    faultmask      0x0                 0
    control        0x0                 0
    

    The purpose of this work is to let my client decide between using this nrf52833 solution and using a SiliconLabs MGM240, which is the router in this test setup.

  • (Updated)

    Please try also with:

    CONFIG_MAIN_STACK_SIZE=4096

    CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096

     

    Regarding the crash in svc instruction, the crash is caused by a call to k_panic() , unfortunately this is a macro.

     

    Please set breakpoints at following functions:

    nrf_802154_assert_handler 

    z_thread_abort 

     

    And run the reproduction scenario, to see if it hits. If it hits please collect the gdb backtrace.

    --

    Hi, 

    Sorry for the late reply. Sigurd is out of the office, so I take this case.

    We see the similar issue in other case, and R&D is trying to reproduce to figure out what might cause this assertion. I will update later when I collect enough information. Please give us more time. Thanks for your patience.

    Regards,
    Amanda H.

  • Hi Amanda,

    due to unknown issues I can no longer reproduce the svc exception in the opensource (non-MPSL) case. When I start ping from the router the NRF child detaches with logs like this:

    ot-daemon-ncs[980]: 00:37:45.420 [I] Mac-----------: Frame tx attempt 16/16 failed, error:NoAck, len:71, seqnum:227, type:Data, src:a611f14676479c9b, dst:6a4d5db8ee09725a, sec:no, ackreq:yes
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: ============================[TX ERR len=016]============================
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: | 61 DC E3 22 22 5A 72 09 | EE B8 5D 4D 6A 9B 9C 47 | a..""Zr...]Mj..G |
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: ------------------------------------------------------------------------
    ot-daemon-ncs[980]: 00:37:45.420 [D] SubMac--------: RadioState: Transmit -> Receive
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: ==============================[TX len=071]==============================
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: | 61 DC E3 22 22 5A 72 09 | EE B8 5D 4D 6A 9B 9C 47 | a..""Zr...]Mj..G |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | 76 46 F1 11 A6 7F 33 F0 | 4D 4C 4D 4C E7 3B 00 15 | vF....3.MLML.;.. |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | CA 66 00 00 00 00 00 01 | 02 8A DE 42 13 98 B3 A6 | .f.........B.... |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | 24 8B 0B D6 2B FE B9 BF | B7 11 42 88 DB 4F 50 B8 | $...+.....B..OP. |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | F1 7D D0 4C 42 F3 EF    |                         | .}.LB..          |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: ------------------------------------------------------------------------
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: Finishing operation "TransmitDataDirect"
    ot-daemon-ncs[980]: 00:37:45.421 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:87, chksum:e73b, ecn:no, to:6a4d5db8ee09725a, sec:no, error:NoAck, prio:net
    


    The router says it drops them as duplicates:

    Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [I] MeshForwarder-: Received IPv6 UDP msg, len:87, chksum:16eb, ecn:no, from:a611f14676479c9b, sec:no, prio:net>
    Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [I] MeshForwarder-:     src:[fe80:0:0:0:a411:f146:7647:9c9b]:19788
    Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [I] MeshForwarder-:     dst:[fe80:0:0:0:684d:5db8:ee09:725a]:19788
    Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [W] Mle-----------: Failed to process UDP: Duplicated
    


    Can we please focus on the MPSL case? We will need this to work with MPSL in the end. With MPSL and the stack sizes you mention I can run ping at 200msec for several minutes (~5 minutes in latest tests).
    Then I'm hitting the original problem with m_assert_handler being called for file 107, line 292.

  • cristic said:
    Can we please focus on the MPSL case? We will need this to work with MPSL in the end. With MPSL and the stack sizes you mention I can run ping at 200msec for several minutes (~5 minutes in latest tests).
    Then I'm hitting the original problem with m_assert_handler being called for file 107, line 292.

    Yes, could you provide a simple project to help us reproduce the issue on the nRF52833DK?

  • Hi Amanda,

    As I've explained before I don't have a DK available. My setup is in Israel and nobody delivers there nowadays. Did you try modifying the coprocessor example with the files I attached above and running it on a DK?

    The dataset I use is:

    > dataset active
    dataset active
    Active Timestamp: 1
    Channel: 15
    Channel Mask: 0x07fff800
    Ext PAN ID: c0de1ab5c0de1ab5
    Mesh Local Prefix: fdde:ad00:beef:0::/64
    Network Key: 1234c0de1ab51234c0de1ab51234c0de
    Network Name: SleepyEFR32
    PAN ID: 0x2222
    PSKc: 992c3b39534992571a6a9045db5319e3
    Security Policy: 672 onrc 0
    Done
    

    I have routereligible disabled and this connects it to a RPI running vanilla OTBR with a Silabs module as RCP. I then run from rpi:

    ping -i 0.2 fdde:ad00:beef:0:0:ff:fe00:40f

    where 0x040f is the RLOC of the NRF child.

    Does this sound simple enough?

Reply
  • Hi Amanda,

    As I've explained before I don't have a DK available. My setup is in Israel and nobody delivers there nowadays. Did you try modifying the coprocessor example with the files I attached above and running it on a DK?

    The dataset I use is:

    > dataset active
    dataset active
    Active Timestamp: 1
    Channel: 15
    Channel Mask: 0x07fff800
    Ext PAN ID: c0de1ab5c0de1ab5
    Mesh Local Prefix: fdde:ad00:beef:0::/64
    Network Key: 1234c0de1ab51234c0de1ab51234c0de
    Network Name: SleepyEFR32
    PAN ID: 0x2222
    PSKc: 992c3b39534992571a6a9045db5319e3
    Security Policy: 672 onrc 0
    Done
    

    I have routereligible disabled and this connects it to a RPI running vanilla OTBR with a Silabs module as RCP. I then run from rpi:

    ping -i 0.2 fdde:ad00:beef:0:0:ff:fe00:40f

    where 0x040f is the RLOC of the NRF child.

    Does this sound simple enough?

Children
No Data
Related