mpsl assert file "107", line 292

Hi guys,

For a client of mine I'm working on an openthread RCP implementation using a SolidRun N8 that contains the NRF52833.

I'm using ncs v2.9.0-nRF54H20-1.

The problem is that after only ~2 minutes of 5 ping packets per second, I get the following assert (seen here in gdb)

Breakpoint 1, m_assert_handler (file=0x20005ebc <z_interrupt_stacks+636> "107", line=292) at /home/cristic/ncs/v2.9.0-nRF54H20-1/nrf/subsys/mpsl/init/mpsl_init.c:304

The code is pretty basic, based on an older sample. It initializes a single instance and runs an endless while:

    while (!otSysPseudoResetWasRequested())
    {
        otTaskletsProcess(instance);
        otSysProcessDrivers(instance);
    }

Here's the full project .config file.

6886.config.txt

Any hint on what I could change to make this work?

Parents
  • Hi,

    that contains the NRF52833.
    I'm using ncs v2.9.0-nRF54H20-1.

    The release you are using is for nRF54H20 only. It's not a qualified release for nRF52833.

    Please use e.g. NCS v2.9.2 instead, and see if that solves your issue.

  • HI Sigurd,

    Thanks a lot for the reply and pointing out the version problem. That was the latest release when I started working with it.

    I have switched to v2.9.2 and I get the same behavior.

    (gdb) bt
    #0  m_assert_handler (file=0x20005ebc <z_interrupt_stacks+636> "107", line=292) at /home/cristic/ncs/v2.9.2/nrf/subsys/mpsl/init/mpsl_init.c:304
    #1  0x000035e0 in sym_S2UAPMFVIQXDUOA6CV7GJMB33TYHEUH5D6LHO5Q ()

    8358.config.txt

    From what I understand it happens during RTC0 interrupt handling. Is there something in my clock configuration that is wrong?

  • Hi!

    It would be helpful if we were able to reproduce the issue here on a DK.

    1) Are you able to reproduce the issue based on some of our samples in NCS? (If yes, which one, how to build it, etc.)

    2) Are you able to reproduce the issue on a nRF52833-DK ?

  • hi Sigurd,

    sadly I don't have DK.

    I'm using the ncs nrf/samples/openthread/coprocessor as a starting point, replace the main.c and update the overlay and prj.conf. The overlay change is needed because the SolidRun has the Fujitsu FWM7BLZ22 containing the NRF52833 connected without RTS/CTS so I have no flow-control.

    I'm building with:

    west build -b nrf52833dk/nrf52833 

    I'm attaching the files:

    /*
     * Copyright (c) 2023 Nordic Semiconductor ASA
     *
     * SPDX-License-Identifier: LicenseRef-Nordic-5-Clause
     */
    #include <zephyr/kernel.h>
    #include <zephyr/logging/log.h>
    #include <openthread-system.h>
    #include <openthread/ncp.h>
    #include <openthread/tasklet.h>
    #include "utils/uart.h"
    
    #if defined(CONFIG_RCP_SAMPLE_HCI)
    #include "rcp_hci.h"
    #endif
    
    LOG_MODULE_REGISTER(coprocessor_sample, CONFIG_OT_COPROCESSOR_LOG_LEVEL);
    
    #define WELCOME_TEXT                                                           \
    	"\n\r"                                                                 \
    	"\n\r"                                                                 \
    	"=========================================================\n\r"        \
    	"OpenThread Coprocessor application is now running on NCS.\n\r"        \
    	"=========================================================\n\r"
    
    // void otPlatUartReceived(const uint8_t *aBuf, uint16_t aBufLength) { otNcpHdlcReceive(aBuf, aBufLength); }
    
    // void otPlatUartSendDone(void) { otNcpHdlcSendDone(); }
    
    static int NcpSend(const uint8_t *aBuf, uint16_t aBufLength)
    {
        otPlatUartSend(aBuf, aBufLength);
        return aBufLength;
    }
    
    void otAppNcpInit(otInstance *aInstance)
    {
        otPlatUartEnable();
    
        otNcpHdlcInit(aInstance, NcpSend);
    }
    
    int main(void)
    {
    	LOG_INF(WELCOME_TEXT);
    
    	otInstance *instance;
    
    pseudo_reset:
    	otSysInit(0, NULL);   // Initializes OpenThread platform
    
    	instance = otInstanceInitSingle();
    
    	otAppNcpInit(instance);
    
    	while (1)
    	{
    		otTaskletsProcess(instance);
    		otSysProcessDrivers(instance);
    	}
    
    	otInstanceFinalize(instance);
    
    #if defined(CONFIG_RCP_SAMPLE_HCI)
    	run_hci();
    #endif
    
    	goto pseudo_reset;
    
    	return 0;
    }
    

    6330.nrf52833dk_nrf52833.overlay

    08732.prj.conf

    I tried it without MPSL yesterday evening, meaning I have added the following to my prj.conf:

    CONFIG_NRF_802154_SL_OPENSOURCE=y
    CONFIG_MPSL=n
    CONFIG_NET_PKT_TXTIME=n
    CONFIG_IEEE802154_CSL_ENDPOINT=n

    The result is still a crash, this time in an svc instruction. Also after ~1 minute.

    (gdb) bt
    #0  arch_system_halt (reason=4) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:30
    #1  0x00011dd8 in k_sys_fatal_error_handler (reason=<optimized out>, esf=<optimized out>) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:44
    #2  0x0000ab8c in z_fatal_error (reason=<optimized out>, esf=<optimized out>) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:119
    #3  0x00001f5e in _oops () at /home/cristic/ncs/v2.9.2/zephyr/arch/arm/core/cortex_m/swap_helper.S:318
    (gdb) info r
    r0             0x4                 4
    r1             0x20004c48          536890440
    r2             0x1                 1
    r3             0x20                32
    r4             0x20000dc8          536874440
    r5             0x0                 0
    r6             0x0                 0
    r7             0x20002c90          536882320
    r8             0x0                 0
    r9             0x0                 0
    r10            0x2                 2
    r11            0x0                 0
    r12            0x6151              24913
    sp             0x20004c28          0x20004c28 <z_interrupt_stacks+552>
    lr             0x11dd9             73177
    pc             0x11dd0             0x11dd0 <arch_system_halt+14>
    xpsr           0x2100000b          553648139
    fpscr          0x0                 0
    msp            0x20004c28          0x20004c28 <z_interrupt_stacks+552>
    psp            0x20004530          0x20004530 <ot_stack_area+1840>
    primask        0x0                 0
    basepri        0x20                32
    faultmask      0x0                 0
    control        0x0                 0
    

    The purpose of this work is to let my client decide between using this nrf52833 solution and using a SiliconLabs MGM240, which is the router in this test setup.

  • (Updated)

    Please try also with:

    CONFIG_MAIN_STACK_SIZE=4096

    CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096

     

    Regarding the crash in svc instruction, the crash is caused by a call to k_panic() , unfortunately this is a macro.

     

    Please set breakpoints at following functions:

    nrf_802154_assert_handler 

    z_thread_abort 

     

    And run the reproduction scenario, to see if it hits. If it hits please collect the gdb backtrace.

    --

    Hi, 

    Sorry for the late reply. Sigurd is out of the office, so I take this case.

    We see the similar issue in other case, and R&D is trying to reproduce to figure out what might cause this assertion. I will update later when I collect enough information. Please give us more time. Thanks for your patience.

    Regards,
    Amanda H.

  • Hi Amanda,

    due to unknown issues I can no longer reproduce the svc exception in the opensource (non-MPSL) case. When I start ping from the router the NRF child detaches with logs like this:

    ot-daemon-ncs[980]: 00:37:45.420 [I] Mac-----------: Frame tx attempt 16/16 failed, error:NoAck, len:71, seqnum:227, type:Data, src:a611f14676479c9b, dst:6a4d5db8ee09725a, sec:no, ackreq:yes
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: ============================[TX ERR len=016]============================
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: | 61 DC E3 22 22 5A 72 09 | EE B8 5D 4D 6A 9B 9C 47 | a..""Zr...]Mj..G |
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: ------------------------------------------------------------------------
    ot-daemon-ncs[980]: 00:37:45.420 [D] SubMac--------: RadioState: Transmit -> Receive
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: ==============================[TX len=071]==============================
    ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: | 61 DC E3 22 22 5A 72 09 | EE B8 5D 4D 6A 9B 9C 47 | a..""Zr...]Mj..G |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | 76 46 F1 11 A6 7F 33 F0 | 4D 4C 4D 4C E7 3B 00 15 | vF....3.MLML.;.. |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | CA 66 00 00 00 00 00 01 | 02 8A DE 42 13 98 B3 A6 | .f.........B.... |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | 24 8B 0B D6 2B FE B9 BF | B7 11 42 88 DB 4F 50 B8 | $...+.....B..OP. |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | F1 7D D0 4C 42 F3 EF    |                         | .}.LB..          |
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: ------------------------------------------------------------------------
    ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: Finishing operation "TransmitDataDirect"
    ot-daemon-ncs[980]: 00:37:45.421 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:87, chksum:e73b, ecn:no, to:6a4d5db8ee09725a, sec:no, error:NoAck, prio:net
    


    The router says it drops them as duplicates:

    Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [I] MeshForwarder-: Received IPv6 UDP msg, len:87, chksum:16eb, ecn:no, from:a611f14676479c9b, sec:no, prio:net>
    Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [I] MeshForwarder-:     src:[fe80:0:0:0:a411:f146:7647:9c9b]:19788
    Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [I] MeshForwarder-:     dst:[fe80:0:0:0:684d:5db8:ee09:725a]:19788
    Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [W] Mle-----------: Failed to process UDP: Duplicated
    


    Can we please focus on the MPSL case? We will need this to work with MPSL in the end. With MPSL and the stack sizes you mention I can run ping at 200msec for several minutes (~5 minutes in latest tests).
    Then I'm hitting the original problem with m_assert_handler being called for file 107, line 292.

  • cristic said:
    Can we please focus on the MPSL case? We will need this to work with MPSL in the end. With MPSL and the stack sizes you mention I can run ping at 200msec for several minutes (~5 minutes in latest tests).
    Then I'm hitting the original problem with m_assert_handler being called for file 107, line 292.

    Yes, could you provide a simple project to help us reproduce the issue on the nRF52833DK?

Reply
  • cristic said:
    Can we please focus on the MPSL case? We will need this to work with MPSL in the end. With MPSL and the stack sizes you mention I can run ping at 200msec for several minutes (~5 minutes in latest tests).
    Then I'm hitting the original problem with m_assert_handler being called for file 107, line 292.

    Yes, could you provide a simple project to help us reproduce the issue on the nRF52833DK?

Children
  • Hi Amanda,

    As I've explained before I don't have a DK available. My setup is in Israel and nobody delivers there nowadays. Did you try modifying the coprocessor example with the files I attached above and running it on a DK?

    The dataset I use is:

    > dataset active
    dataset active
    Active Timestamp: 1
    Channel: 15
    Channel Mask: 0x07fff800
    Ext PAN ID: c0de1ab5c0de1ab5
    Mesh Local Prefix: fdde:ad00:beef:0::/64
    Network Key: 1234c0de1ab51234c0de1ab51234c0de
    Network Name: SleepyEFR32
    PAN ID: 0x2222
    PSKc: 992c3b39534992571a6a9045db5319e3
    Security Policy: 672 onrc 0
    Done
    

    I have routereligible disabled and this connects it to a RPI running vanilla OTBR with a Silabs module as RCP. I then run from rpi:

    ping -i 0.2 fdde:ad00:beef:0:0:ff:fe00:40f

    where 0x040f is the RLOC of the NRF child.

    Does this sound simple enough?

Related