When frozen then reset, nRF52840 RC osc/RTC doesn't tick and LFCLKSRC, etc doesn't init correctly

Our product is based on the Nordic nRF52840, running a fork of the v2.7.1 branch of Zephyr, using an external 32.768Khz oscillator for the sysclock/LFCLK.    This works fine at room temperature, starting with the internal RC osc, then correctly automatically switching transparently to the external oscillator.

We cold temperature tested the device a few ways using a freezer and also freeze spray.   When the device is frozen then reset (via Segger Ozone.J-Link or a Zephyr reset function), on startup the sysclock/LFCLK is set to the internal RC source and is not ticking (or at least the ARM's sysclokc/RTC thinks it isn't ticking.)    Because of this, the first time  k_sleep() is called it hangs until the board warms up and the LFCLK/RTC resumes ticking.      Some of the state/source registers for the LFCLK also seem to be messed up and even after the LFCLK starts ticking, in this case the ARM hw never switches from the RC to  external oscillator.    The  nRF52840 module and external osc are rated to -40C, so this stuff should work during our tests.  

When the ARM's at room temp:

  • LFCLKSRC eventually runs off the 32.768Khz osc and works fine.
  • I get a dbg msg:  <dbg> clock_control.clkstarted_handle: lfclk: Clock started
  • The LFCLKSTAT register is $10000 before k_sleep, which means the LFXO oscillator is not running
    After k_sleep exits, LFCLKSTAT register is $10001, which means the LFXO oscillator is running

  • The LFCLK register is ticking (verified via counting ticks using Nordic's RTC driver)
  • k_sleeps work fine.

When the ARM's cold:

  • Both the application and the bootloader seem to be running the LFCLKSRC off the RC oscillator
  • I get a dbg msg:  <dbg> clock_control.clkstarted_handle: lfclk: Clock started
  • The LFCLKSTAT register is $10001 before sleep, which means the LFXO oscillator is the LFCLKSRC and is running (!)
    After sleep exits, LFCLKSTAT register is $10000, which means the LFXO oscillator is the LFCLKSRC and not running (!)

  • The LFCLK is not ticking (via counting ticks using Nordic's RTC driver)

  • k_sleep hangs until the ARM warms up   (the bootloader has no sleeps in it, that's why it doesn't hang)   The first k_sleep is in the application main() function

  • After the board warms, the LFCLK is ticking (verified via counting ticks using Nordic's RTC driver)

I've attempted to fix the issue by manually switching LFCLKSRC during sysclock initialization.  I call my code right at the end of the existing sys_clock_driver_init function in nrf_rtc_timer.c

This solution appears to work when the board is frozen.  It's based on information in the nRF52840 product specification and internet posts.   Any comments on if this is the proper way to manually switch the LFCLKSRC?   Could you suggest how I might make the switch better, more robust, etc?   Are there erratas for this? (I did a quick search, but didn't find anything.)

Thanks!
Ross

// RSW - When the ARM is cold, then reset by Zephyr reset function, debugger reset, etc it will
// not switch the LFCLK source properly between the internal RC and external oscillator.    The RC osc
// stays selected, but does not increment the timer count register (RC may or may not be oscillating)
// This causes k_sleeps to hang, because at least according to the ARM hw, there is no LFCLK timer tick

// Found this sequencing information in a post at
// https://devzone.nordicsemi.com/f/nordic-q-a/53362/how-to-check-nrf52840-external-low-frequency-crystal-32khz-is-connected-or-not
// and adapted it for our problem
//
// This sequence roughly follows what's described in the 5.4.2 LFCLK controller section of the product spec doc
//
// This fix is specific to our boards, which use an external 32.768 oscillator for the RTC/sysclock
//

void rsw_sysclock_driver_init_fix(void)
{
   #define LFCLKSRC_EXT_OSC   (3UL<<16 | CLOCK_LFCLKSRC_SRC_Xtal)    //RSW - 3<<16 is External + Bypass

   //RSW - if the clock source is already set properly, we're done here
   if (NRF_CLOCK->LFCLKSRC == LFCLKSRC_EXT_OSC)
      return;
   

   // RSW - they didn't explicitly stop the clock in the online post, but I think you
   // probably want to stop it before changing the source...?  The reference manual doesn't
   // say.  There is also no EVENTS_LFCLKSTOPPED flag.
   NRF_CLOCK->TASKS_LFCLKSTOP = 1;

   NRF_CLOCK->EVENTS_LFCLKSTARTED = 0;

   // Synchronize register writes to 16MHz AHB clock by reading the same register.
   // Protect against out of order execution by explicitly casting to volatile.
   (volatile void)NRF_CLOCK->EVENTS_LFCLKSTARTED;
   NRF_CLOCK->LFCLKSRC = LFCLKSRC_EXT_OSC;
   NRF_CLOCK->TASKS_LFCLKSTART = 1;


   // RSW - original post's code checked for EVENTS_LFCLKSTARTED for
   // "so long", then if that doesn't work, tries to start the clock over and
   // over again.  I *think* it's probably safer (at least for now) to start
   // the clock once and wait "forever" for it to run
   // We'd sort of like a timeout, but there's no timebase because we are
   // setting up the RTC, which is the timebase
   while (!NRF_CLOCK->EVENTS_LFCLKSTARTED)
      ;
}



int sys_clock_driver_init(const struct device *dev)
{
    ARG_UNUSED(dev);
    static const enum nrf_lfclk_start_mode mode =
        IS_ENABLED(CONFIG_SYSTEM_CLOCK_NO_WAIT) ?
            CLOCK_CONTROL_NRF_LF_START_NOWAIT :
            (IS_ENABLED(CONFIG_SYSTEM_CLOCK_WAIT_FOR_AVAILABILITY) ?
            CLOCK_CONTROL_NRF_LF_START_AVAILABLE :
            CLOCK_CONTROL_NRF_LF_START_STABLE);

    /* TODO: replace with counter driver to access RTC */
    nrf_rtc_prescaler_set(RTC, 0);
    for (int32_t chan = 0; chan < CHAN_COUNT; chan++) {
        nrf_rtc_int_enable(RTC, RTC_CHANNEL_INT_MASK(chan));
    }

    NVIC_ClearPendingIRQ(RTC_IRQn);

    IRQ_CONNECT(RTC_IRQn, DT_IRQ(DT_NODELABEL(RTC_LABEL), priority),
            rtc_nrf_isr, 0, 0);
    irq_enable(RTC_IRQn);

    nrf_rtc_task_trigger(RTC, NRF_RTC_TASK_CLEAR);
    nrf_rtc_task_trigger(RTC, NRF_RTC_TASK_START);

    int_mask = BIT_MASK(CHAN_COUNT);
    if (CONFIG_NRF_RTC_TIMER_USER_CHAN_COUNT) {
        alloc_mask = BIT_MASK(EXT_CHAN_COUNT) << 1;
    }

    if (!IS_ENABLED(CONFIG_TICKLESS_KERNEL)) {
        compare_set(0, counter() + CYC_PER_TICK,
                sys_clock_timeout_handler, NULL);
    }

    z_nrf_clock_control_lf_on(mode);

    if (IS_ENABLED(CONFIG_RSW_SYSCLOCK_DRIVER_INIT_FIX)) {
         rsw_sysclock_driver_init_fix();
    }


    return 0;
}

  • Hi Rossquatech,

    This seems to be very close to what we specify as the absolute temperature range for this chip (-40 degree C).

    Can you please confirm if my understanding is right?
    So based on your description,

    • the chip is being tested at room temperature and at -40 Celsius.
    • Both of them have LFCLKSRC set to LFCLK and then switches to LF XTAL.

    Is this being tested only in one board or is it reproducible on every board?

    What is the lazer marking on the nRF52840 chip? 

    Have you tested this at -38 degree Celsius and see the same issues? 

  • I didn't say the chip was being tested at the -40C spec limit

    • Our freezer is set at -18C.
    • Every board tested in the freezer has this issue.   They all work fine at room temp.
    • I do not have access to the chip's marking.    The chip is on a Fanstel BC840M module, which is also rated for -40C.   The areas I have issue with (the RC oscillator and LFCLK module) are all internal to the ARM

    Looks to me like the RC oscillator may be not starting after reset at low temperatures.   This is detailed to a great extent in my original post.   If this is in fact the issue, I proposed a workaround and was looking for confirmation/comments/etc on the viability of that solution.

    Thanks,
    Ross

  • rossquatch, 

    I need some more time to do some internal investigation if we have seen something like this before. And then comment on the workaround you suggested. 

    It is still not clear if the issue is with the nRF Chip ( including ARM core) or on the Fanstel module. I will update you once I have some info on this.

  • It seems fairly clear to me it's the nRF chip, as the RC oscillator and LFCLK controller are internal to the chip, but I encourage you to investigate further.   These are the steps to reproduce the issue on my system, which is running  a fork of the v2.7.1 branch of Zephyr w/ a stock bootloader, configured to use an external 32.768Khz rail-to-rail external oscillator connected to P0.00/XCL1, with P0.01/XCL2 unconnected:

    • Cool a running board to -20C   (we used a -18C freezer, or freeze spray with temp unknown)
    • Reset using either Segger Ozone/J-Link reset or the Zephyr reset function 
    • Don't call k_sleep in your test application, it will likely hang when frozen 
    • When application starts running, check LFCLKSRC (should be LFCLKSRC_EXT_OSC, but I get LFCLKSRC_RC)
    • Also check if clock register is ticking (I used the z_nrf_rtc_timer_get_ticks() function)   I see the tick count remain the same until the board warms.
    • After the board warms up, z_nrf_rtc_timer_get_ticks shows the clock is ticking again, but it is still using the LFCLKSRC_RC and some of the other status registers are wrong

    Running the same test at room temp, everything worked properly, LFCLKSRC is set to LFCLKSRC_EXT_OSC, clock is ticking, etc...

    Really what I'd like is your feedback on my suggested workaround.

    Thanks!

  • rossquatch said:
    Really what I'd like is your feedback on my suggested workaround.

    Your fix seems harmless, so I do not think adding it will make anything wrong. Just a small suggestion after stopping the clock, it is better to wait.

       // RSW - they didn't explicitly stop the clock in the online post, but I think you
       // probably want to stop it before changing the source...?  The reference manual doesn't
       // say.  There is also no EVENTS_LFCLKSTOPPED flag.
       NRF_CLOCK->TASKS_LFCLKSTOP = 1;
       
       while((NRF_CLOCK->LFCLKSTAT &
                 CLOCK_LFCLKSTAT_STATE_Msk) >> CLOCK_LFCLKSTAT_STATE_Pos);

Related