Crash in sd_clock_hfclk_is_running on Soft Device S140, 7.3.0

Hi, I recently noticed crashing in sd_clock_hfclk_is_running() on a nrf52840 using SoftDevice S140 7.3.0. This is the callstack:

??@0x00000ac4 (Unknown Source:0)

<signal handler called>@0xffffffe9 (Unknown Source:0)

sd_clock_hfclk_is_running@0x000276ae (.../nRF5_SDK_17.1.0_ddde560/components/softdevice/s140/headers/nrf_soc.h:720)

I'm using the following to enable the hfclk whenever I enable QSPI to avoid errata 244:

sd_clock_hfclk_request();
uint32_t isHfclkRunning = 0;
do {
  APP_ERROR_CHECK(sd_clock_hfclk_is_running(&isHfclkRunning));
} while (!isHfclkRunning);
I can trigger this somewhat reliably if I unplug and plug usb power while this code triggers.
Any tips on how I can avoid this issue?
Thanks,
Jeff
Parents
  • Is there any other way to avoid the errata other than turning on the hfclk? it'd be nice to avoid the power draw that is associated with this workaround.

  • I am yet to hear from them. I will check your suggestion too in the meantime.

    -Priyanka

  • The value seems right -- `00 10 00 00` bytes at the address. Note that another reason I don't think it's the interrupt disabled issue is that when I replace the sd_clock_hfclk_is_running with my manual register lookup, it all works, despite there being a sd_clock_hfclk_release call later, which should also crash according to that theory.

  • Hi,

    I've been looking into this a bit and I'm not sure what the root cause for the crash you see can be.

    You seem to be hitting a hardfault/busfault at instruction at 0xac4, that is an instruction inside
    the SVC handler in the MBR, master boot record. The function you are calling in the softdevice sd_clock_hfclk_is_running, is implemented as a SVC intrrupt, and the MBR simply forwards it to the Softdevice, and judging by your stack-frame you posted, when the crash happens, it seems like the it hasn't even reached the Softdevice, it crashes inside the MBR, in the code that simply forwards interrupts (including SVC interrupts) to Softdevice.

    One possible explanation is corruption of the instruction in FLASH itself, does this happen easily for you on many different boards? If you have only reproduced it on only one board, is it possible that you have exhausted the number flash erase cycles nrf52840 supports. From top of my head I think that is 10'000.

    Another explanation might be corruption of callstack somehow, maybe you can try increasing the interrupt callstack a bit and see if problem goes away?

    Reading NRF_CLOCK->HFCLKSTAT like you have found to work sounds safe to me, meaning Softdevice doesn't protect NRF_CLOCK peripheral from being read.
    So you busy-waiting for that to change to 0x10001 sounds safe in that regards. However, because of  pan-201, you might want to switch to NRF_CLOCK->EVENTS_HFCLKSTARTED instead.
    That is what the Softdevice will read if you call sd_clock_hfclk_is_running.
    So in this sense, I think switching to while (NRF_CLOCK->EVENTS_HFCLKSTARTED == 0) is a good solution. However, even if that works, I still have a feeling there is something wrong that might fail in a different way somewhere else.

    Best regards,
    Martin Tverdal
    Softdevice team.
  • Thanks so much for such a detailed reply.

    I should clarify -- the device enters a non-responsive state, and loses BLE connections, but doesn't trigger the debugger automatically. If I pause execution, I get that call stack.

    Regarding pan-201 -- I see that it's listed in rev1, but not rev2 of the nrf52840. Is it safe to assume pan-201 is not an issue since I'm running on rev2?

  • yes, good point pan-201 should not be a concern for you then!

  • I tried this again today, and the `NRF_CLOCK->EVENTS_HFCLKSTARTED == 0` works, but testing with the old sd_clock_hfclk_request() approach again gave me a different callstack:

    The call hung, and stopping execution landed on 0x25ed8:

    0x00025ed2: mrs r1, MSP
    0x00025ed6: ldr r0, [r1, #24]
    0x00025ed8: subs r0, #2
    0x00025eda: ldrb r0, [r0, #0]
    0x00025edc: cmp r0, #16
    0x00025ede: blt.n 0x25f08
    Sharing this in case it helps at all.
Reply
  • I tried this again today, and the `NRF_CLOCK->EVENTS_HFCLKSTARTED == 0` works, but testing with the old sd_clock_hfclk_request() approach again gave me a different callstack:

    The call hung, and stopping execution landed on 0x25ed8:

    0x00025ed2: mrs r1, MSP
    0x00025ed6: ldr r0, [r1, #24]
    0x00025ed8: subs r0, #2
    0x00025eda: ldrb r0, [r0, #0]
    0x00025edc: cmp r0, #16
    0x00025ede: blt.n 0x25f08
    Sharing this in case it helps at all.
Children
No Data
Related