nRF54L15 MPSL init stack corruption

I have been bringing up an application on nRF54L15-DK, and about 50% of the time it hardfaults on initialization. I tracked down the cause to a branch to address 0, caused by stack corruption to the return PC.

I can't get it to crash on sample code, but the root cause is there there on any nRF54 build including MPSL.

Here are the steps:

- Select any Bluetooth sample, I'm using peripheral_hids_keyboard.

- Build for nrf54l15dk/nrf54l15/cpuapp

- Load onto target and attach a debugger.

- Breakpoint on mpsl_lib_init_sys

- Observe that the call stack shows it's being called in PRE_KERNEL_1 init level, which from Zephyr docs says "uses the interrupt stack"

- Observe that the SP register is within the z_interrupt_stack range, confirming it's using interrupt stack

- Observe that the core is currently using PSP, and MSP is still at the top of the interrupt stack. The "background" stack and interrupt stack overlap, in other words.

- Within mpsl_lib_init_sys, it's connecting several interrupt handlers, including mpsl_rtc0_isr_wrapper (IRQ 229)

- Observe that NVIC_ISER7 is 0x00000020, so IRQ 229 is already enabled. This happens earlier in mpsl_init, no source available.

So at this point, IRQ 229 is enabled with the mpsl_rtc0_isr_wrapper handler registered. When it occurs, it will use MSP, and clobber the current contents of the PSP stack used in the background. You're unlikely to see the interrupt occur in the example, but that's what's happening in my application some of the time.

There are other interrupts enabled by mpsl_init, as shown in NVIC_ISERn. Any of them has the potential to cause stack corruption.

I'm not familiar enough with the Zephyr init sequence or MPSL dependencies, but seems like this init is happening too early - interrupts cannot be enabled while the background is still sharing the interrupt stack. Perhaps the intent was that interrupts are globally disabled during the PRE_KERNEL init stages, but it is unmasked.

Related