Crash in sd_clock_hfclk_is_running on Soft Device S140, 7.3.0

Hi, I recently noticed crashing in sd_clock_hfclk_is_running() on a nrf52840 using SoftDevice S140 7.3.0. This is the callstack:

??@0x00000ac4 (Unknown Source:0)

<signal handler called>@0xffffffe9 (Unknown Source:0)

sd_clock_hfclk_is_running@0x000276ae (.../nRF5_SDK_17.1.0_ddde560/components/softdevice/s140/headers/nrf_soc.h:720)

I'm using the following to enable the hfclk whenever I enable QSPI to avoid errata 244:

sd_clock_hfclk_request();

uint32_t isHfclkRunning = 0;

do {

APP_ERROR_CHECK(sd_clock_hfclk_is_running(&isHfclkRunning));

} while (!isHfclkRunning);

I can trigger this somewhat reliably if I unplug and plug usb power while this code triggers.

Any tips on how I can avoid this issue?

Thanks,

Jeff

Top Replies

Priyanka 10 months ago in reply to Priyanka +1

Hi, From the callstack, it looks like the crash has happened in application and not inside sd_clock_hfclk_is_running, since the address is 0x000276ae, which is outside Softdevice.. One possibility is that…

Parents

0 jthlim 11 months ago

Is there any other way to avoid the errata other than turning on the hfclk? it'd be nice to avoid the power draw that is associated with this workaround.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Priyanka 11 months ago in reply to jthlim

I am yet to hear from them. I will check your suggestion too in the meantime.

-Priyanka
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Priyanka 10 months ago in reply to Priyanka

Hi,

From the callstack, it looks like the crash has happened in application and not inside sd_clock_hfclk_is_running, since the address is 0x000276ae, which is outside Softdevice..

One possibility is that you end up calling sd_clock_hfclk_is_running from an interrupt handler that has higher (or equal) priority than SVC priority, that is not allowed by design. If you call any SD function from a interrupt handler like that, it will hardfault.

-Priyanka
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 jthlim 10 months ago in reply to Priyanka

`sd_clock_hfclk_is_running` is defined as such in the disassembly at that address:

svc 68 @ 0x44
bx lr

The address is just a stub into the soft device.

I don't believe it is ever being called from an interrupt handler (I'm pretty sure it's not, other things would be very problematic in the code path). There is a call right above it which succeeds to the softdevice too: sd_clock_hfclk_request().
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Priyanka 10 months ago in reply to jthlim

Hi,

We still feel that there might be some mess-up with svc and interrupt configuration. Maybe you end up at this point, calling sd_clock_hfclk_is_running while you have disabled interrupts then?

The call to all sd_ functions is just implemented as triggering svc interrupts, and when we tried on purpose to do that in a simple program like this, it does hardfault.

-Priyanka
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 jthlim 10 months ago in reply to Priyanka

The flow of the program is:

sd_clock_hfclk_request();

do {

sd_clock_hfclk_is_running(...);

} while (...);

If any SVC interrupt causes a hard fault when called in an interrupt, would it not crash when calling sd_clock_hfclk_request first?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 jthlim 10 months ago in reply to Priyanka

The flow of the program is:

sd_clock_hfclk_request();

do {

sd_clock_hfclk_is_running(...);

} while (...);

If any SVC interrupt causes a hard fault when called in an interrupt, would it not crash when calling sd_clock_hfclk_request first?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Priyanka 10 months ago in reply to jthlim

Yes, that is true, unless this code gets interrupted by another interrupt, and that interupt disables irqs for example, and then return.
To demonstrate, you can see this simple program.

void SVC_Handler(void)
{
__disable_irq();
}

void __svc( 10 ) dummy( void ) ;
int main(void)
{
dummy();
__nop();
__nop();
__nop();
__nop();
dummy();
while(1)
{
}
}

It will hardfault on the second call to dummy(), because the first time the SVC handler ran, it disabled interrupts and then returned.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 jthlim 10 months ago in reply to Priyanka

So is there any condition under which sd_clock_hfclk_request() or the nrf5 sdk disables irqs? I don't have a single disable_irq in my entire codebase.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Priyanka 10 months ago in reply to jthlim

Hi,
I was informed by our experts about another suspicion after looking more closely at your callstack contents,
??@0x00000ac4 (Unknown Source:0)

That line there on top of your callstack makes us suspect that the content of address 0x20000000 (so the first address in RAM), has been corrupted somehow.
Please check that before you call sd_clock_hfclk_is_running().
The content should, in your case, always be 00001000. it is that address at which the MBR stores where to forward interrupts and 0x1000 will make MBR forward interrupts to Softdevice.

You can check this, for example, by adding the following code before the call to sd_clock_hfclk_is_running that fails, and see if you get stuck there, and tell us what the value at 0x20000000 is if it hangs there:

if(*((uint32_t*)0x20000000)!=0x1000)
{
while(1);
}

Or maybe you can just connect with debugger and see what content is at 0x20000000 when the program has crashed.

-Priyanka
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 jthlim 10 months ago in reply to Priyanka

The value seems right -- `00 10 00 00` bytes at the address. Note that another reason I don't think it's the interrupt disabled issue is that when I replace the sd_clock_hfclk_is_running with my manual register lookup, it all works, despite there being a sd_clock_hfclk_release call later, which should also crash according to that theory.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Martin Tverdal 10 months ago in reply to jthlim

Hi,

I've been looking into this a bit and I'm not sure what the root cause for the crash you see can be.

You seem to be hitting a hardfault/busfault at instruction at 0xac4, that is an instruction inside

the SVC handler in the MBR, master boot record. The function you are calling in the softdevice sd_clock_hfclk_is_running, is implemented as a SVC intrrupt, and the MBR simply forwards it to the Softdevice, and judging by your stack-frame you posted, when the crash happens, it seems like the it hasn't even reached the Softdevice, it crashes inside the MBR, in the code that simply forwards interrupts (including SVC interrupts) to Softdevice.

One possible explanation is corruption of the instruction in FLASH itself, does this happen easily for you on many different boards? If you have only reproduced it on only one board, is it possible that you have exhausted the number flash erase cycles nrf52840 supports. From top of my head I think that is 10'000.

Another explanation might be corruption of callstack somehow, maybe you can try increasing the interrupt callstack a bit and see if problem goes away?

Reading NRF_CLOCK->HFCLKSTAT like you have found to work sounds safe to me, meaning Softdevice doesn't protect NRF_CLOCK peripheral from being read.

So you busy-waiting for that to change to 0x10001 sounds safe in that regards. However, because of pan-201, you might want to switch to NRF_CLOCK->EVENTS_HFCLKSTARTED instead.

That is what the Softdevice will read if you call sd_clock_hfclk_is_running.

So in this sense, I think switching to while (NRF_CLOCK->EVENTS_HFCLKSTARTED == 0) is a good solution. However, even if that works, I still have a feeling there is something wrong that might fail in a different way somewhere else.

Best regards,

Martin Tverdal

Softdevice team.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel