starting/stoping HFXO leading to CPU lockup and faults

i have some bare-metal RF code that runs on my nRF52-DK, which i'm currently porting to my nRF54-DK.... functionally, everthing is working just fine -- the boards can in fact talk to one another....

in both designs, i explicitly start/stop the external 32MHz crystal using the CLOCK peripheral.... since i'm often starting the HFXO after awakening from a low-power sleep, i'll need to wait for the crystal to become ready (actrually TUNED on the nRF54).... since this can require ~200-300us, i'll await a CLOCK interrupt once i've performed the START task.... needless to say, this WFI puts the MCU into a "light-sleep" where the HFCLK keeps running....

again, all is well on the nRF52.... one the nRF54, however, i'm seeing very erratic behavior.... in a stand-alone test where i simply start/wait, i've seen my program spontaneously reset due to a CPU lockup.... i've also take illegal memory access faults.... putting various delays in my test program simply "moves the problem around".... and sometimes (on a cold morning), it actually works....

putting this in a larger context, when i attempt to repeatedly transmit a packet (say at 4Hz), i'll fail in this manner after some initial set of (1-5) packets successfully go over the air....

in general, my test (which works just fine on the nRF52) will repeatedly start-wait-stop the external crystal....

is there some specific to the nRF54 that i might be overlooking??? i've tried various combinations of explicitly using the START and XOTUNE tasks; i've also tried using PLLSTART/PLLSTOP around "short" WFI sequences....

finally, i'm also not 100% convinced that i'm fully disabling the HFXO (and by extension the HFCLK/PLL) when i enter "deep-sleep".... unlike the nRF52, i'm still measuring 400uA of current when in deep-sleep.... is there some other subtle difference between the nRF52 and nRF54 that i might be missing as well???

Parents

0 Håkon Alseth 2 months ago

Hi,

The intended behavior of the hfxo start/stop is described here:

https://docs.nordicsemi.com/bundle/ps_nrf54L15/page/clock.html#ariaid-title2

However, there is errata on this behavior, more specifically:

https://docs.nordicsemi.com/bundle/errata_nRF54L15_Rev1/page/ERR/nRF54L15/Rev1/latest/anomaly_L15_39.html#anomaly_L15_39

And:

https://docs.nordicsemi.com/bundle/errata_nRF54L15_Rev1/page/ERR/nRF54L15/Rev1/latest/anomaly_L15_20.html

There is a routine here to request the hfxo:

https://github.com/nrfconnect/sdk-nrf/blob/main/samples/peripheral/radio_test/src/main.c#L36

And to release the hfxo, you can call onoff_cancel_or_release().

Q1: If you are getting assertions and faults, please share the log of these?

Q2: Are you using the radio peripheral directly? Or are you using the SoftDevice?

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 bios-bob 2 months ago in reply to Håkon Alseth

several points.... this is *really* bare-metal; i'm not using any runtime code from the SDK.... having said that, i certainly have built/run/analyzed the radio_test sample to help find my way....

to answer Q1: since i'm not using your runtime code, i have nothing to share.... i have, however, seen random HW memory faults....

to answer Q2: i'm using the peripheral directly....

my start routine, for instance, talks directly to the CLOCK peripheral -- clearing EVENTS_XOTUNED and then setting TASKS_PLLSTART, TASKS_XOSTART, TASKS_XOTUNE....

when waiting for EVENTS_XOTUNED, i have to assign a read of EVENTS_XOTUNED to a static volatile variable before testing the value.... timing of this bare-metal is such that simply testing EVENTS_XOTUNED in a while loop never converges....

i seen similar brittleness when trivial refactorings cause failures -- such as removing the code of an unused CLOCK interrupt routine....

i'll make another pass over your clock_init code, hopefully finding some nuance which i may have missed.... at the same time, there are *MANY* more instructions executed in your code than in mine.... and i've certainly discovered that "adding arbitrary delays" has often fixed my issues.... it's as if i should add ISB,DSB barriers between successive reads/writes to CLOCK registers???
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 bios-bob 2 months ago in reply to bios-bob

a more general question.... on the nRF52, i was able to use a CLOCK interrupt when waiting for the HFXTAL to stabilize.... for whatever reason, this doesn't work on the nRF52....

as a quick experiment, i have a "short duration" pause function that internally uses a timer interrupt.... since i'm currently "active" for about 400us awaiting XOTUNED, it would be nice to at least idle the CPU itself.... i tried pausing for 100us -- and it definitely worked while simultaneously reducing powert consumption....

unfortunately, after a short number of wakeup-transmit-sleep cycles, i triggered the same sort of random memory fault i described earlier....

again, my hunch is that the interrupt per-se is not the issue; it's a subtle "race-condition" that surfaces almost immediately upon *some* wakeup from low-power sleep....

another question: should i verify the state of the "default" HFOSC before starting the HFXO??? i can tell you that my start() function is called almost immediately after returning from my deep-sleep WFI....
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Håkon Alseth 2 months ago in reply to bios-bob

Hi,

bios-bob said:
several points.... this is *really* bare-metal; i'm not using any runtime code from the SDK.... having said that, i certainly have built/run/analyzed the radio_test sample to help find my way....

to answer Q1: since i'm not using your runtime code, i have nothing to share.... i have, however, seen random HW memory faults....

to answer Q2: i'm using the peripheral directly....

my start routine, for instance, talks directly to the CLOCK peripheral -- clearing EVENTS_XOTUNED and then setting TASKS_PLLSTART, TASKS_XOSTART, TASKS_XOTUNE....

when waiting for EVENTS_XOTUNED, i have to assign a read of EVENTS_XOTUNED to a static volatile variable before testing the value.... timing of this bare-metal is such that simply testing EVENTS_XOTUNED in a while loop never converges....

i seen similar brittleness when trivial refactorings cause failures -- such as removing the code of an unused CLOCK interrupt routine....

Understood.

The fault here is not towards triggering a hardfault, memfault etc, but towards loops never ending.

bios-bob said:
i'll make another pass over your clock_init code, hopefully finding some nuance which i may have missed.... at the same time, there are *MANY* more instructions executed in your code than in mine.... and i've certainly discovered that "adding arbitrary delays" has often fixed my issues.... it's as if i should add ISB,DSB barriers between successive reads/writes to CLOCK registers???

Since the CPU run on a higher frequency than the peripheral domain, there will be use-cases where you should generate a wait-state (ISB/DSB/etc). One of the ways to generate a wait-state between the cpu and the peripheral domain is to read a arbitrary event, (void)PERIPHERAL->EVENTS_EVENT;.

As an example, the below will likely fail due to a slower peripheral net being queried:

NRF_PERIPHERAL->TASKS_START=1;

if(NRF_PERIPHERAL->STATUSREG & HAS_STARTED) {

}

Fix here is to insert a wait-state before the if-sentence.

You could also have a look at the nrfx_clock driver to see how it handles the events from the peripheral:

https://github.com/zephyrproject-rtos/hal_nordic/blob/master/nrfx/drivers/src/nrfx_clock.c

bios-bob said:
as a quick experiment, i have a "short duration" pause function that internally uses a timer interrupt.... since i'm currently "active" for about 400us awaiting XOTUNED, it would be nice to at least idle the CPU itself.... i tried pausing for 100us -- and it definitely worked while simultaneously reducing powert consumption....

unfortunately, after a short number of wakeup-transmit-sleep cycles, i triggered the same sort of random memory fault i described earlier....

again, my hunch is that the interrupt per-se is not the issue; it's a subtle "race-condition" that surfaces almost immediately upon *some* wakeup from low-power sleep....

Can you explain what scenario this is, is this with the radio active? Even if you post register definition snippets, it tells a lot in terms of behavior.

bios-bob said:
another question: should i verify the state of the "default" HFOSC before starting the HFXO??? i can tell you that my start() function is called almost immediately after returning from my deep-sleep WFI....

Reset state is running from RC oscillator. if your __start() function is triggered, it sounds like you are receiving a reset of sorts. What is the default behavior if you get a non-maskable interrupt?

Can you share the chip markings of your nRF54L15 device / DK revision?

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Håkon Alseth 2 months ago in reply to bios-bob

Hi,

bios-bob said:
several points.... this is *really* bare-metal; i'm not using any runtime code from the SDK.... having said that, i certainly have built/run/analyzed the radio_test sample to help find my way....

to answer Q1: since i'm not using your runtime code, i have nothing to share.... i have, however, seen random HW memory faults....

to answer Q2: i'm using the peripheral directly....

my start routine, for instance, talks directly to the CLOCK peripheral -- clearing EVENTS_XOTUNED and then setting TASKS_PLLSTART, TASKS_XOSTART, TASKS_XOTUNE....

when waiting for EVENTS_XOTUNED, i have to assign a read of EVENTS_XOTUNED to a static volatile variable before testing the value.... timing of this bare-metal is such that simply testing EVENTS_XOTUNED in a while loop never converges....

i seen similar brittleness when trivial refactorings cause failures -- such as removing the code of an unused CLOCK interrupt routine....

Understood.

The fault here is not towards triggering a hardfault, memfault etc, but towards loops never ending.

bios-bob said:
i'll make another pass over your clock_init code, hopefully finding some nuance which i may have missed.... at the same time, there are *MANY* more instructions executed in your code than in mine.... and i've certainly discovered that "adding arbitrary delays" has often fixed my issues.... it's as if i should add ISB,DSB barriers between successive reads/writes to CLOCK registers???

Since the CPU run on a higher frequency than the peripheral domain, there will be use-cases where you should generate a wait-state (ISB/DSB/etc). One of the ways to generate a wait-state between the cpu and the peripheral domain is to read a arbitrary event, (void)PERIPHERAL->EVENTS_EVENT;.

As an example, the below will likely fail due to a slower peripheral net being queried:

NRF_PERIPHERAL->TASKS_START=1;

if(NRF_PERIPHERAL->STATUSREG & HAS_STARTED) {

}

Fix here is to insert a wait-state before the if-sentence.

You could also have a look at the nrfx_clock driver to see how it handles the events from the peripheral:

https://github.com/zephyrproject-rtos/hal_nordic/blob/master/nrfx/drivers/src/nrfx_clock.c

bios-bob said:
as a quick experiment, i have a "short duration" pause function that internally uses a timer interrupt.... since i'm currently "active" for about 400us awaiting XOTUNED, it would be nice to at least idle the CPU itself.... i tried pausing for 100us -- and it definitely worked while simultaneously reducing powert consumption....

unfortunately, after a short number of wakeup-transmit-sleep cycles, i triggered the same sort of random memory fault i described earlier....

again, my hunch is that the interrupt per-se is not the issue; it's a subtle "race-condition" that surfaces almost immediately upon *some* wakeup from low-power sleep....

Can you explain what scenario this is, is this with the radio active? Even if you post register definition snippets, it tells a lot in terms of behavior.

bios-bob said:
another question: should i verify the state of the "default" HFOSC before starting the HFXO??? i can tell you that my start() function is called almost immediately after returning from my deep-sleep WFI....

Reset state is running from RC oscillator. if your __start() function is triggered, it sounds like you are receiving a reset of sorts. What is the default behavior if you get a non-maskable interrupt?

Can you share the chip markings of your nRF54L15 device / DK revision?

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 bios-bob 2 months ago in reply to Håkon Alseth
nRF54-DK version 0.9.2... chip markings N54L15 / QFAAB0 / 2444AB...

for the purpose of the following explanation, let me use 'deep-sleep' to mean an IDLE state in which HF clock is off and most peripherals disable; this should consume about ~1uA.... by contrast, 'lite-sleep' is also an IDLE state but in which the HF clock plus other peripherals are running (~1mA).... both states are enter through the WFI instruction....

the overall flow of the test program is to repeatedly:

wakeup from a 'deep-sleep' through a periodic RTC interrrupt

immediately start the HFXTAL

wait for the HFXTAL to stabilize as part of enabling the radio

transmit several packets on different channels

enter 'lite-sleep' while the radio is transmitting

stop the HFXTAL as part of disabling the radio

reenter 'deep-sleep' awaiting the next RTC interrupt

my runtime environment enable notifications to various drivers when we're about to enter 'deep-sleep' as well as that we've just awoken from 'deep-sleep'.... this callbacks occur just before/after the WFI instruction....

in step 2) above, this mechanism causes my HfXtal driver to start the XO ramp-up process -- which generally takes ~400us to receive the XOTUNED event.... though we are setting up some RADIO registers at this time, the RADIO peripheral itself is in its DISABLED state....

once transmission begins, we enter a 'lite-sleep' in which HFXTAL is (obviously) running.... only when we finally disable the RADIO do we perform XOSTOP and PLLSTOP tasks.... (in theory, this could have been through the 'deep-sleep-entry' callback -- but there is slight power advantage to stopping as early as possible)

to emphasize a key point: my HfXtal_start() function is called on every cycle, almost immediately after wakeup from 'deep-sleep'.... it during this function that i'm currently seeing memory faults -- which occur randomly after several successful wakeup-transmit-sleep cycles....

my current HfXtal_start() routine is:

EVENTS_XOTUNED = 0

TASKS_PLLSTART = 1

TASKS_XOSTART = 1

TASKS_XOTUNE = 1

my HfXtal_stop() routine is:

TASKS_XOTUNEABORT = 1

TASKS_XOSTOP = 1

TASKS_PLLSTOP = 1

EVENTS_XOTUNED = 0

and finally, my HfXtal_wait() routine is:

volatile done = false

while (!done) done = (EVENTS_XOTUNED != 0)

though it's still quite brittle, this implementation "sort of works" for an extended period of time.... as suggested above, i added some wait-states between each statement in each of these routines.... unclear whether it made a difference....

what i would like to do is modify HfXtal_wait() along the following lines:

volatile done = false

while (!done) {

enter 'light-sleep' for 100us

done = (EVENTS_XOTUNED != 0)

}

with my minimalistic bare-metal code, even this optimization resulted in a 10% power savings!!!! [[ on the nRF52, i'm actually using the CLOCK interrupt itself for even greater savings ]] and while this is only a 'lite-sleep', the uJoules do add up

any further insights would be much appreciated
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 bios-bob 2 months ago in reply to bios-bob

some more information....

i'm seeing the problem related to HfXtal_stop(), which i've modified to not only included wait-states but also to test that the XO and PLL are indeed not running....

if i do *not* call HfXtal_stop(), everything works just fine -- except that i'll draw 100s uA of current when in 'deep-sleep' awaiting the RTC interrupt:

when i *do* call HfXtal_stop() after the RADIO is in its DISABLED state, i can execute 3-8 wakeup-sleep cycles before encountering a HW memory fault:

when working, i'm not drawing 1.5uA when in 'deep-sleep'....

i have been able to capture the 8 register values on the exception frame.... the LR usually points to a location where i'm notifying drivers of an impending entry into 'deep-sleep' via WFI.... the PC is a little more random -- and in one instance was actually 0x0....

i've tried moving the HfXtal_stop() to immediately before the 'deep-sleep' WFI.... i still see failure -- but now i've already turned off GPIOs and the UART, so i have no real-time trace....

any other suggestions???
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Håkon Alseth 2 months ago in reply to bios-bob

Hi,

Can you please share your routines for recreating the issue?

Can you also share the fault contents? Ie. stack trace / cpu regs at the time of faulting?

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel