mpsl assert file "107", line 292

Hi guys,

For a client of mine I'm working on an openthread RCP implementation using a SolidRun N8 that contains the NRF52833.

I'm using ncs v2.9.0-nRF54H20-1.

The problem is that after only ~2 minutes of 5 ping packets per second, I get the following assert (seen here in gdb)

Breakpoint 1, m_assert_handler (file=0x20005ebc <z_interrupt_stacks+636> "107", line=292) at /home/cristic/ncs/v2.9.0-nRF54H20-1/nrf/subsys/mpsl/init/mpsl_init.c:304

The code is pretty basic, based on an older sample. It initializes a single instance and runs an endless while:

    while (!otSysPseudoResetWasRequested())
    {
        otTaskletsProcess(instance);
        otSysProcessDrivers(instance);
    }

Here's the full project .config file.

6886.config.txt

Any hint on what I could change to make this work?

Parents
  • Hi,

    that contains the NRF52833.
    I'm using ncs v2.9.0-nRF54H20-1.

    The release you are using is for nRF54H20 only. It's not a qualified release for nRF52833.

    Please use e.g. NCS v2.9.2 instead, and see if that solves your issue.

  • HI Sigurd,

    Thanks a lot for the reply and pointing out the version problem. That was the latest release when I started working with it.

    I have switched to v2.9.2 and I get the same behavior.

    (gdb) bt
    #0  m_assert_handler (file=0x20005ebc <z_interrupt_stacks+636> "107", line=292) at /home/cristic/ncs/v2.9.2/nrf/subsys/mpsl/init/mpsl_init.c:304
    #1  0x000035e0 in sym_S2UAPMFVIQXDUOA6CV7GJMB33TYHEUH5D6LHO5Q ()

    8358.config.txt

    From what I understand it happens during RTC0 interrupt handling. Is there something in my clock configuration that is wrong?

Reply
  • HI Sigurd,

    Thanks a lot for the reply and pointing out the version problem. That was the latest release when I started working with it.

    I have switched to v2.9.2 and I get the same behavior.

    (gdb) bt
    #0  m_assert_handler (file=0x20005ebc <z_interrupt_stacks+636> "107", line=292) at /home/cristic/ncs/v2.9.2/nrf/subsys/mpsl/init/mpsl_init.c:304
    #1  0x000035e0 in sym_S2UAPMFVIQXDUOA6CV7GJMB33TYHEUH5D6LHO5Q ()

    8358.config.txt

    From what I understand it happens during RTC0 interrupt handling. Is there something in my clock configuration that is wrong?

Children
  • Hi!

    Are you blocking interrupts for some longer time (3us or more) ?

  • Hi Sigurd,

    I'm not doing that explicitly. I'm just calling those two OT APIs in a loop as listed above. Not playing with irq_lock() at all. I assume that's what you're asking about, not blocking a specific interrupt.

    The trigger for the problem is radio traffic. The more traffic, the faster it happens. This device is working as a child to a router and I'm using IPv6 ping from the router towards this device to generate traffic. With 1 ping per second the problem appears after ~5 minutes, with 5 pings per second it appears in under 2 minutes. Without the pings it can hold for hours. So the device does both Rx and Tx, ping packets seem to be ~100 bytes, rssi:-58, channel 15.

    The other device under stress is, I guess, the UART0 used to connect to the ot-daemon on the host (at 2Mbit baudrate). There's no logging.

    Just trying to list things that could hog the CPU in interrupt context.

    Is that 3us hardcoded? Any workaround I could try?

  • Hi!

    It would be helpful if we were able to reproduce the issue here on a DK.

    1) Are you able to reproduce the issue based on some of our samples in NCS? (If yes, which one, how to build it, etc.)

    2) Are you able to reproduce the issue on a nRF52833-DK ?

  • hi Sigurd,

    sadly I don't have DK.

    I'm using the ncs nrf/samples/openthread/coprocessor as a starting point, replace the main.c and update the overlay and prj.conf. The overlay change is needed because the SolidRun has the Fujitsu FWM7BLZ22 containing the NRF52833 connected without RTS/CTS so I have no flow-control.

    I'm building with:

    west build -b nrf52833dk/nrf52833 

    I'm attaching the files:

    /*
     * Copyright (c) 2023 Nordic Semiconductor ASA
     *
     * SPDX-License-Identifier: LicenseRef-Nordic-5-Clause
     */
    #include <zephyr/kernel.h>
    #include <zephyr/logging/log.h>
    #include <openthread-system.h>
    #include <openthread/ncp.h>
    #include <openthread/tasklet.h>
    #include "utils/uart.h"
    
    #if defined(CONFIG_RCP_SAMPLE_HCI)
    #include "rcp_hci.h"
    #endif
    
    LOG_MODULE_REGISTER(coprocessor_sample, CONFIG_OT_COPROCESSOR_LOG_LEVEL);
    
    #define WELCOME_TEXT                                                           \
    	"\n\r"                                                                 \
    	"\n\r"                                                                 \
    	"=========================================================\n\r"        \
    	"OpenThread Coprocessor application is now running on NCS.\n\r"        \
    	"=========================================================\n\r"
    
    // void otPlatUartReceived(const uint8_t *aBuf, uint16_t aBufLength) { otNcpHdlcReceive(aBuf, aBufLength); }
    
    // void otPlatUartSendDone(void) { otNcpHdlcSendDone(); }
    
    static int NcpSend(const uint8_t *aBuf, uint16_t aBufLength)
    {
        otPlatUartSend(aBuf, aBufLength);
        return aBufLength;
    }
    
    void otAppNcpInit(otInstance *aInstance)
    {
        otPlatUartEnable();
    
        otNcpHdlcInit(aInstance, NcpSend);
    }
    
    int main(void)
    {
    	LOG_INF(WELCOME_TEXT);
    
    	otInstance *instance;
    
    pseudo_reset:
    	otSysInit(0, NULL);   // Initializes OpenThread platform
    
    	instance = otInstanceInitSingle();
    
    	otAppNcpInit(instance);
    
    	while (1)
    	{
    		otTaskletsProcess(instance);
    		otSysProcessDrivers(instance);
    	}
    
    	otInstanceFinalize(instance);
    
    #if defined(CONFIG_RCP_SAMPLE_HCI)
    	run_hci();
    #endif
    
    	goto pseudo_reset;
    
    	return 0;
    }
    

    6330.nrf52833dk_nrf52833.overlay

    08732.prj.conf

    I tried it without MPSL yesterday evening, meaning I have added the following to my prj.conf:

    CONFIG_NRF_802154_SL_OPENSOURCE=y
    CONFIG_MPSL=n
    CONFIG_NET_PKT_TXTIME=n
    CONFIG_IEEE802154_CSL_ENDPOINT=n

    The result is still a crash, this time in an svc instruction. Also after ~1 minute.

    (gdb) bt
    #0  arch_system_halt (reason=4) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:30
    #1  0x00011dd8 in k_sys_fatal_error_handler (reason=<optimized out>, esf=<optimized out>) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:44
    #2  0x0000ab8c in z_fatal_error (reason=<optimized out>, esf=<optimized out>) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:119
    #3  0x00001f5e in _oops () at /home/cristic/ncs/v2.9.2/zephyr/arch/arm/core/cortex_m/swap_helper.S:318
    (gdb) info r
    r0             0x4                 4
    r1             0x20004c48          536890440
    r2             0x1                 1
    r3             0x20                32
    r4             0x20000dc8          536874440
    r5             0x0                 0
    r6             0x0                 0
    r7             0x20002c90          536882320
    r8             0x0                 0
    r9             0x0                 0
    r10            0x2                 2
    r11            0x0                 0
    r12            0x6151              24913
    sp             0x20004c28          0x20004c28 <z_interrupt_stacks+552>
    lr             0x11dd9             73177
    pc             0x11dd0             0x11dd0 <arch_system_halt+14>
    xpsr           0x2100000b          553648139
    fpscr          0x0                 0
    msp            0x20004c28          0x20004c28 <z_interrupt_stacks+552>
    psp            0x20004530          0x20004530 <ot_stack_area+1840>
    primask        0x0                 0
    basepri        0x20                32
    faultmask      0x0                 0
    control        0x0                 0
    

    The purpose of this work is to let my client decide between using this nrf52833 solution and using a SiliconLabs MGM240, which is the router in this test setup.

Related