mpsl assert file "107", line 292

cristic 1 month ago

Hi guys,

For a client of mine I'm working on an openthread RCP implementation using a SolidRun N8 that contains the NRF52833.

I'm using ncs v2.9.0-nRF54H20-1.

The problem is that after only ~2 minutes of 5 ping packets per second, I get the following assert (seen here in gdb)

Breakpoint 1, m_assert_handler (file=0x20005ebc <z_interrupt_stacks+636> "107", line=292) at /home/cristic/ncs/v2.9.0-nRF54H20-1/nrf/subsys/mpsl/init/mpsl_init.c:304

The code is pretty basic, based on an older sample. It initializes a single instance and runs an endless while:

    while (!otSysPseudoResetWasRequested())
    {
        otTaskletsProcess(instance);
        otSysProcessDrivers(instance);
    }

Here's the full project .config file.

6886.config.txt

Any hint on what I could change to make this work?

Parents

0 Sigurd 1 month ago

Hi,

that contains the NRF52833.
I'm using ncs v2.9.0-nRF54H20-1.

The release you are using is for nRF54H20 only. It's not a qualified release for nRF52833.

Please use e.g. NCS v2.9.2 instead, and see if that solves your issue.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 cristic 1 month ago in reply to Sigurd

HI Sigurd,

Thanks a lot for the reply and pointing out the version problem. That was the latest release when I started working with it.

I have switched to v2.9.2 and I get the same behavior.

(gdb) bt
#0 m_assert_handler (file=0x20005ebc <z_interrupt_stacks+636> "107", line=292) at /home/cristic/ncs/v2.9.2/nrf/subsys/mpsl/init/mpsl_init.c:304
#1 0x000035e0 in sym_S2UAPMFVIQXDUOA6CV7GJMB33TYHEUH5D6LHO5Q ()

8358.config.txt

From what I understand it happens during RTC0 interrupt handling. Is there something in my clock configuration that is wrong?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Sigurd 1 month ago in reply to cristic

Hi!

It would be helpful if we were able to reproduce the issue here on a DK.

1) Are you able to reproduce the issue based on some of our samples in NCS? (If yes, which one, how to build it, etc.)

2) Are you able to reproduce the issue on a nRF52833-DK ?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 cristic 1 month ago in reply to Sigurd

hi Sigurd,

sadly I don't have DK.

I'm using the ncs nrf/samples/openthread/coprocessor as a starting point, replace the main.c and update the overlay and prj.conf. The overlay change is needed because the SolidRun has the Fujitsu FWM7BLZ22 containing the NRF52833 connected without RTS/CTS so I have no flow-control.

I'm building with:

west build -b nrf52833dk/nrf52833

I'm attaching the files:

Fullscreen 70013.main.c Download

/*
 * Copyright (c) 2023 Nordic Semiconductor ASA
 *
 * SPDX-License-Identifier: LicenseRef-Nordic-5-Clause
 */
#include <zephyr/kernel.h>
#include <zephyr/logging/log.h>
#include <openthread-system.h>
#include <openthread/ncp.h>
#include <openthread/tasklet.h>
#include "utils/uart.h"

#if defined(CONFIG_RCP_SAMPLE_HCI)
#include "rcp_hci.h"
#endif

LOG_MODULE_REGISTER(coprocessor_sample, CONFIG_OT_COPROCESSOR_LOG_LEVEL);

#define WELCOME_TEXT                                                           \
	"\n\r"                                                                 \
	"\n\r"                                                                 \
	"=========================================================\n\r"        \
	"OpenThread Coprocessor application is now running on NCS.\n\r"        \
	"=========================================================\n\r"

// void otPlatUartReceived(const uint8_t *aBuf, uint16_t aBufLength) { otNcpHdlcReceive(aBuf, aBufLength); }

// void otPlatUartSendDone(void) { otNcpHdlcSendDone(); }

static int NcpSend(const uint8_t *aBuf, uint16_t aBufLength)
{
    otPlatUartSend(aBuf, aBufLength);
    return aBufLength;
}

void otAppNcpInit(otInstance *aInstance)
{
    otPlatUartEnable();

    otNcpHdlcInit(aInstance, NcpSend);
}

int main(void)
{
	LOG_INF(WELCOME_TEXT);

	otInstance *instance;

pseudo_reset:
	otSysInit(0, NULL);   // Initializes OpenThread platform

	instance = otInstanceInitSingle();

	otAppNcpInit(instance);

	while (1)
	{
		otTaskletsProcess(instance);
		otSysProcessDrivers(instance);
	}

	otInstanceFinalize(instance);

#if defined(CONFIG_RCP_SAMPLE_HCI)
	run_hci();
#endif

	goto pseudo_reset;

	return 0;
}

6330.nrf52833dk_nrf52833.overlay

08732.prj.conf

I tried it without MPSL yesterday evening, meaning I have added the following to my prj.conf:

CONFIG_NRF_802154_SL_OPENSOURCE=y
CONFIG_MPSL=n
CONFIG_NET_PKT_TXTIME=n
CONFIG_IEEE802154_CSL_ENDPOINT=n

The result is still a crash, this time in an svc instruction. Also after ~1 minute.

(gdb) bt
#0  arch_system_halt (reason=4) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:30
#1  0x00011dd8 in k_sys_fatal_error_handler (reason=<optimized out>, esf=<optimized out>) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:44
#2  0x0000ab8c in z_fatal_error (reason=<optimized out>, esf=<optimized out>) at /home/cristic/ncs/v2.9.2/zephyr/kernel/fatal.c:119
#3  0x00001f5e in _oops () at /home/cristic/ncs/v2.9.2/zephyr/arch/arm/core/cortex_m/swap_helper.S:318
(gdb) info r
r0             0x4                 4
r1             0x20004c48          536890440
r2             0x1                 1
r3             0x20                32
r4             0x20000dc8          536874440
r5             0x0                 0
r6             0x0                 0
r7             0x20002c90          536882320
r8             0x0                 0
r9             0x0                 0
r10            0x2                 2
r11            0x0                 0
r12            0x6151              24913
sp             0x20004c28          0x20004c28 <z_interrupt_stacks+552>
lr             0x11dd9             73177
pc             0x11dd0             0x11dd0 <arch_system_halt+14>
xpsr           0x2100000b          553648139
fpscr          0x0                 0
msp            0x20004c28          0x20004c28 <z_interrupt_stacks+552>
psp            0x20004530          0x20004530 <ot_stack_area+1840>
primask        0x0                 0
basepri        0x20                32
faultmask      0x0                 0
control        0x0                 0

The purpose of this work is to let my client decide between using this nrf52833 solution and using a SiliconLabs MGM240, which is the router in this test setup.

0 Amanda Hsieh 5 days ago in reply to cristic

(Updated)

Please try also with:

CONFIG_MAIN_STACK_SIZE=4096

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096

Regarding the crash in svc instruction, the crash is caused by a call to k_panic() , unfortunately this is a macro.

Please set breakpoints at following functions:

nrf_802154_assert_handler

z_thread_abort

And run the reproduction scenario, to see if it hits. If it hits please collect the gdb backtrace.

--

Hi,

Sorry for the late reply. Sigurd is out of the office, so I take this case.

We see the similar issue in other case, and R&D is trying to reproduce to figure out what might cause this assertion. I will update later when I collect enough information. Please give us more time. Thanks for your patience.

Regards,
Amanda H.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 cristic 8 hours ago in reply to Amanda Hsieh

Hi Amanda,

due to unknown issues I can no longer reproduce the svc exception in the opensource (non-MPSL) case. When I start ping from the router the NRF child detaches with logs like this:

ot-daemon-ncs[980]: 00:37:45.420 [I] Mac-----------: Frame tx attempt 16/16 failed, error:NoAck, len:71, seqnum:227, type:Data, src:a611f14676479c9b, dst:6a4d5db8ee09725a, sec:no, ackreq:yes
ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: ============================[TX ERR len=016]============================
ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: | 61 DC E3 22 22 5A 72 09 | EE B8 5D 4D 6A 9B 9C 47 | a..""Zr...]Mj..G |
ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: ------------------------------------------------------------------------
ot-daemon-ncs[980]: 00:37:45.420 [D] SubMac--------: RadioState: Transmit -> Receive
ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: ==============================[TX len=071]==============================
ot-daemon-ncs[980]: 00:37:45.420 [D] Mac-----------: | 61 DC E3 22 22 5A 72 09 | EE B8 5D 4D 6A 9B 9C 47 | a..""Zr...]Mj..G |
ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | 76 46 F1 11 A6 7F 33 F0 | 4D 4C 4D 4C E7 3B 00 15 | vF....3.MLML.;.. |
ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | CA 66 00 00 00 00 00 01 | 02 8A DE 42 13 98 B3 A6 | .f.........B.... |
ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | 24 8B 0B D6 2B FE B9 BF | B7 11 42 88 DB 4F 50 B8 | $...+.....B..OP. |
ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: | F1 7D D0 4C 42 F3 EF    |                         | .}.LB..          |
ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: ------------------------------------------------------------------------
ot-daemon-ncs[980]: 00:37:45.421 [D] Mac-----------: Finishing operation "TransmitDataDirect"
ot-daemon-ncs[980]: 00:37:45.421 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:87, chksum:e73b, ecn:no, to:6a4d5db8ee09725a, sec:no, error:NoAck, prio:net

The router says it drops them as duplicates:

Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [I] MeshForwarder-: Received IPv6 UDP msg, len:87, chksum:16eb, ecn:no, from:a611f14676479c9b, sec:no, prio:net>
Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [I] MeshForwarder-:     src:[fe80:0:0:0:a411:f146:7647:9c9b]:19788
Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [I] MeshForwarder-:     dst:[fe80:0:0:0:684d:5db8:ee09:725a]:19788
Jul 08 15:10:13 raspberrypi otbr-agent[1320]: 02:27:01.194 [W] Mle-----------: Failed to process UDP: Duplicated

Can we please focus on the MPSL case? We will need this to work with MPSL in the end. With MPSL and the stack sizes you mention I can run ping at 200msec for several minutes (~5 minutes in latest tests).
Then I'm hitting the original problem with m_assert_handler being called for file 107, line 292.

0 Amanda Hsieh 8 hours ago in reply to cristic

cristic said:
Can we please focus on the MPSL case? We will need this to work with MPSL in the end. With MPSL and the stack sizes you mention I can run ping at 200msec for several minutes (~5 minutes in latest tests).
Then I'm hitting the original problem with m_assert_handler being called for file 107, line 292.

Yes, could you provide a simple project to help us reproduce the issue on the nRF52833DK?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Amanda Hsieh 8 hours ago in reply to cristic

cristic said:
Can we please focus on the MPSL case? We will need this to work with MPSL in the end. With MPSL and the stack sizes you mention I can run ping at 200msec for several minutes (~5 minutes in latest tests).
Then I'm hitting the original problem with m_assert_handler being called for file 107, line 292.

Yes, could you provide a simple project to help us reproduce the issue on the nRF52833DK?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 cristic 7 hours ago in reply to Amanda Hsieh
Hi Amanda,

As I've explained before I don't have a DK available. My setup is in Israel and nobody delivers there nowadays. Did you try modifying the coprocessor example with the files I attached above and running it on a DK?

The dataset I use is:
> dataset active dataset active Active Timestamp: 1 Channel: 15 Channel Mask: 0x07fff800 Ext PAN ID: c0de1ab5c0de1ab5 Mesh Local Prefix: fdde:ad00:beef:0::/64 Network Key: 1234c0de1ab51234c0de1ab51234c0de Network Name: SleepyEFR32 PAN ID: 0x2222 PSKc: 992c3b39534992571a6a9045db5319e3 Security Policy: 672 onrc 0 Done

I have routereligible disabled and this connects it to a RPI running vanilla OTBR with a Silabs module as RCP. I then run from rpi:
ping -i 0.2 fdde:ad00:beef:0:0:ff:fe00:40f

where 0x040f is the RLOC of the NRF child.

Does this sound simple enough?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel