starting/stoping HFXO leading to CPU lockup and faults

i have some bare-metal RF code that runs on my nRF52-DK, which i'm currently porting to my nRF54-DK....  functionally, everthing is working just fine -- the boards can in fact talk to one another....

in both designs, i explicitly start/stop the external 32MHz crystal using the CLOCK peripheral....  since i'm often starting the HFXO after awakening from a low-power sleep, i'll need to wait for the crystal to become ready (actrually TUNED on the nRF54)....   since this can require ~200-300us, i'll await a CLOCK interrupt once i've performed the START task....  needless to say, this WFI puts the MCU into a "light-sleep" where the HFCLK keeps running....

again, all is well on the nRF52....  one the nRF54, however, i'm seeing very erratic behavior....  in a stand-alone test where i simply start/wait, i've seen my program spontaneously reset due to a CPU lockup....  i've also take illegal memory access faults....  putting various delays in my test program simply "moves the problem around"....  and sometimes (on a cold morning), it actually works....

putting this in a larger context, when i attempt to repeatedly transmit a packet (say at 4Hz), i'll fail in this manner after some initial set of (1-5) packets successfully go over the air....

in general, my test (which works just fine on the nRF52) will repeatedly start-wait-stop the external crystal....

is there some specific to the nRF54 that i might be overlooking???  i've tried various combinations of explicitly using the START and XOTUNE tasks; i've also tried using PLLSTART/PLLSTOP around "short" WFI sequences....

finally, i'm also not 100% convinced that i'm fully disabling the HFXO (and by extension the HFCLK/PLL) when i enter "deep-sleep"....  unlike the nRF52, i'm still measuring 400uA of current when in deep-sleep....  is there some other subtle difference between the nRF52 and nRF54 that i might be missing as well???

Parents
  • Hi,

     

    The intended behavior of the hfxo start/stop is described here:

    https://docs.nordicsemi.com/bundle/ps_nrf54L15/page/clock.html#ariaid-title2

     

    However, there is errata on this behavior, more specifically:

    https://docs.nordicsemi.com/bundle/errata_nRF54L15_Rev1/page/ERR/nRF54L15/Rev1/latest/anomaly_L15_39.html#anomaly_L15_39

    And:

    https://docs.nordicsemi.com/bundle/errata_nRF54L15_Rev1/page/ERR/nRF54L15/Rev1/latest/anomaly_L15_20.html

     

     There is a routine here to request the hfxo:

    https://github.com/nrfconnect/sdk-nrf/blob/main/samples/peripheral/radio_test/src/main.c#L36

    And to release the hfxo, you can call onoff_cancel_or_release().

     

    Q1: If you are getting assertions and faults, please share the log of these?

    Q2: Are you using the radio peripheral directly? Or are you using the SoftDevice?

     

    Kind regards,

    Håkon

  • several points....  this is *really* bare-metal; i'm not using any runtime code from the SDK....  having said that, i certainly have built/run/analyzed the radio_test sample to help find my way....

    to answer Q1:  since i'm not using your runtime code, i have nothing to share....  i have, however, seen random HW memory faults....

    to answer Q2: i'm using the peripheral directly....

    my start routine, for instance, talks directly to the CLOCK peripheral -- clearing EVENTS_XOTUNED and then setting TASKS_PLLSTART, TASKS_XOSTART, TASKS_XOTUNE....

    when waiting for EVENTS_XOTUNED, i have to assign a read of EVENTS_XOTUNED to a static volatile variable before testing the value....  timing of this bare-metal is such that simply testing EVENTS_XOTUNED in a while loop never converges....

    i seen similar brittleness when trivial refactorings cause failures -- such as removing the code of an unused CLOCK interrupt routine....

    i'll make another pass over your clock_init code, hopefully finding some nuance which i may have missed....   at the same time, there are *MANY* more instructions executed in your code than in mine....  and i've certainly discovered that "adding arbitrary delays" has often fixed my issues....  it's as if i should add ISB,DSB barriers between successive reads/writes to CLOCK registers???

  • a more general question....  on the nRF52, i was able to use a CLOCK interrupt when waiting for the HFXTAL to stabilize....  for whatever reason, this doesn't work on the nRF52....

    as a quick experiment, i have a "short duration" pause function that internally uses a timer interrupt....  since i'm currently "active" for about 400us awaiting XOTUNED, it would be nice to at least idle the CPU itself....  i tried pausing for 100us -- and it definitely worked while simultaneously reducing powert consumption....

    unfortunately, after a short number of wakeup-transmit-sleep cycles, i triggered the same sort of random memory fault i described earlier....

    again, my hunch is that the interrupt per-se is not the issue; it's a subtle "race-condition" that surfaces almost immediately upon *some* wakeup from low-power sleep....

    another question:  should i verify the state of the "default" HFOSC before starting the HFXO???  i can tell you that my start() function is called almost immediately after returning from my deep-sleep WFI....

  • Hi,

     

    bios-bob said:

    several points....  this is *really* bare-metal; i'm not using any runtime code from the SDK....  having said that, i certainly have built/run/analyzed the radio_test sample to help find my way....

    to answer Q1:  since i'm not using your runtime code, i have nothing to share....  i have, however, seen random HW memory faults....

    to answer Q2: i'm using the peripheral directly....

    my start routine, for instance, talks directly to the CLOCK peripheral -- clearing EVENTS_XOTUNED and then setting TASKS_PLLSTART, TASKS_XOSTART, TASKS_XOTUNE....

    when waiting for EVENTS_XOTUNED, i have to assign a read of EVENTS_XOTUNED to a static volatile variable before testing the value....  timing of this bare-metal is such that simply testing EVENTS_XOTUNED in a while loop never converges....

    i seen similar brittleness when trivial refactorings cause failures -- such as removing the code of an unused CLOCK interrupt routine....

    Understood.

    The fault here is not towards triggering a hardfault, memfault etc, but towards loops never ending.

    bios-bob said:
    i'll make another pass over your clock_init code, hopefully finding some nuance which i may have missed....   at the same time, there are *MANY* more instructions executed in your code than in mine....  and i've certainly discovered that "adding arbitrary delays" has often fixed my issues....  it's as if i should add ISB,DSB barriers between successive reads/writes to CLOCK registers???

    Since the CPU run on a higher frequency than the peripheral domain, there will be use-cases where you should generate a wait-state (ISB/DSB/etc). One of the ways to generate a wait-state between the cpu and the peripheral domain is to read a arbitrary event, (void)PERIPHERAL->EVENTS_EVENT;.

    As an example, the below will likely fail due to a slower peripheral net being queried:

    NRF_PERIPHERAL->TASKS_START=1;

    if(NRF_PERIPHERAL->STATUSREG & HAS_STARTED) {

    }

     

    Fix here is to insert a wait-state before the if-sentence.

     

    You could also have a look at the nrfx_clock driver to see how it handles the events from the peripheral:

    https://github.com/zephyrproject-rtos/hal_nordic/blob/master/nrfx/drivers/src/nrfx_clock.c

     

    bios-bob said:

    as a quick experiment, i have a "short duration" pause function that internally uses a timer interrupt....  since i'm currently "active" for about 400us awaiting XOTUNED, it would be nice to at least idle the CPU itself....  i tried pausing for 100us -- and it definitely worked while simultaneously reducing powert consumption....

    unfortunately, after a short number of wakeup-transmit-sleep cycles, i triggered the same sort of random memory fault i described earlier....

    again, my hunch is that the interrupt per-se is not the issue; it's a subtle "race-condition" that surfaces almost immediately upon *some* wakeup from low-power sleep....

     Can you explain what scenario this is, is this with the radio active? Even if you post register definition snippets, it tells a lot in terms of behavior.

    bios-bob said:
    another question:  should i verify the state of the "default" HFOSC before starting the HFXO???  i can tell you that my start() function is called almost immediately after returning from my deep-sleep WFI....

    Reset state is running from RC oscillator. if your __start() function is triggered, it sounds like you are receiving a reset of sorts. What is the default behavior if you get a non-maskable interrupt?

     

    Can you share the chip markings of your nRF54L15 device / DK revision?

     

    Kind regards,

    Håkon

Reply
  • Hi,

     

    bios-bob said:

    several points....  this is *really* bare-metal; i'm not using any runtime code from the SDK....  having said that, i certainly have built/run/analyzed the radio_test sample to help find my way....

    to answer Q1:  since i'm not using your runtime code, i have nothing to share....  i have, however, seen random HW memory faults....

    to answer Q2: i'm using the peripheral directly....

    my start routine, for instance, talks directly to the CLOCK peripheral -- clearing EVENTS_XOTUNED and then setting TASKS_PLLSTART, TASKS_XOSTART, TASKS_XOTUNE....

    when waiting for EVENTS_XOTUNED, i have to assign a read of EVENTS_XOTUNED to a static volatile variable before testing the value....  timing of this bare-metal is such that simply testing EVENTS_XOTUNED in a while loop never converges....

    i seen similar brittleness when trivial refactorings cause failures -- such as removing the code of an unused CLOCK interrupt routine....

    Understood.

    The fault here is not towards triggering a hardfault, memfault etc, but towards loops never ending.

    bios-bob said:
    i'll make another pass over your clock_init code, hopefully finding some nuance which i may have missed....   at the same time, there are *MANY* more instructions executed in your code than in mine....  and i've certainly discovered that "adding arbitrary delays" has often fixed my issues....  it's as if i should add ISB,DSB barriers between successive reads/writes to CLOCK registers???

    Since the CPU run on a higher frequency than the peripheral domain, there will be use-cases where you should generate a wait-state (ISB/DSB/etc). One of the ways to generate a wait-state between the cpu and the peripheral domain is to read a arbitrary event, (void)PERIPHERAL->EVENTS_EVENT;.

    As an example, the below will likely fail due to a slower peripheral net being queried:

    NRF_PERIPHERAL->TASKS_START=1;

    if(NRF_PERIPHERAL->STATUSREG & HAS_STARTED) {

    }

     

    Fix here is to insert a wait-state before the if-sentence.

     

    You could also have a look at the nrfx_clock driver to see how it handles the events from the peripheral:

    https://github.com/zephyrproject-rtos/hal_nordic/blob/master/nrfx/drivers/src/nrfx_clock.c

     

    bios-bob said:

    as a quick experiment, i have a "short duration" pause function that internally uses a timer interrupt....  since i'm currently "active" for about 400us awaiting XOTUNED, it would be nice to at least idle the CPU itself....  i tried pausing for 100us -- and it definitely worked while simultaneously reducing powert consumption....

    unfortunately, after a short number of wakeup-transmit-sleep cycles, i triggered the same sort of random memory fault i described earlier....

    again, my hunch is that the interrupt per-se is not the issue; it's a subtle "race-condition" that surfaces almost immediately upon *some* wakeup from low-power sleep....

     Can you explain what scenario this is, is this with the radio active? Even if you post register definition snippets, it tells a lot in terms of behavior.

    bios-bob said:
    another question:  should i verify the state of the "default" HFOSC before starting the HFXO???  i can tell you that my start() function is called almost immediately after returning from my deep-sleep WFI....

    Reset state is running from RC oscillator. if your __start() function is triggered, it sounds like you are receiving a reset of sorts. What is the default behavior if you get a non-maskable interrupt?

     

    Can you share the chip markings of your nRF54L15 device / DK revision?

     

    Kind regards,

    Håkon

Children
No Data
Related