When frozen then reset, nRF52840 RC osc/RTC doesn't tick and LFCLKSRC, etc doesn't init correctly

Our product is based on the Nordic nRF52840, running a fork of the v2.7.1 branch of Zephyr, using an external 32.768Khz oscillator for the sysclock/LFCLK.    This works fine at room temperature, starting with the internal RC osc, then correctly automatically switching transparently to the external oscillator.

We cold temperature tested the device a few ways using a freezer and also freeze spray.   When the device is frozen then reset (via Segger Ozone.J-Link or a Zephyr reset function), on startup the sysclock/LFCLK is set to the internal RC source and is not ticking (or at least the ARM's sysclokc/RTC thinks it isn't ticking.)    Because of this, the first time  k_sleep() is called it hangs until the board warms up and the LFCLK/RTC resumes ticking.      Some of the state/source registers for the LFCLK also seem to be messed up and even after the LFCLK starts ticking, in this case the ARM hw never switches from the RC to  external oscillator.    The  nRF52840 module and external osc are rated to -40C, so this stuff should work during our tests.  

When the ARM's at room temp:

  • LFCLKSRC eventually runs off the 32.768Khz osc and works fine.
  • I get a dbg msg:  <dbg> clock_control.clkstarted_handle: lfclk: Clock started
  • The LFCLKSTAT register is $10000 before k_sleep, which means the LFXO oscillator is not running
    After k_sleep exits, LFCLKSTAT register is $10001, which means the LFXO oscillator is running

  • The LFCLK register is ticking (verified via counting ticks using Nordic's RTC driver)
  • k_sleeps work fine.

When the ARM's cold:

  • Both the application and the bootloader seem to be running the LFCLKSRC off the RC oscillator
  • I get a dbg msg:  <dbg> clock_control.clkstarted_handle: lfclk: Clock started
  • The LFCLKSTAT register is $10001 before sleep, which means the LFXO oscillator is the LFCLKSRC and is running (!)
    After sleep exits, LFCLKSTAT register is $10000, which means the LFXO oscillator is the LFCLKSRC and not running (!)

  • The LFCLK is not ticking (via counting ticks using Nordic's RTC driver)

  • k_sleep hangs until the ARM warms up   (the bootloader has no sleeps in it, that's why it doesn't hang)   The first k_sleep is in the application main() function

  • After the board warms, the LFCLK is ticking (verified via counting ticks using Nordic's RTC driver)

I've attempted to fix the issue by manually switching LFCLKSRC during sysclock initialization.  I call my code right at the end of the existing sys_clock_driver_init function in nrf_rtc_timer.c

This solution appears to work when the board is frozen.  It's based on information in the nRF52840 product specification and internet posts.   Any comments on if this is the proper way to manually switch the LFCLKSRC?   Could you suggest how I might make the switch better, more robust, etc?   Are there erratas for this? (I did a quick search, but didn't find anything.)

Thanks!
Ross

// RSW - When the ARM is cold, then reset by Zephyr reset function, debugger reset, etc it will
// not switch the LFCLK source properly between the internal RC and external oscillator.    The RC osc
// stays selected, but does not increment the timer count register (RC may or may not be oscillating)
// This causes k_sleeps to hang, because at least according to the ARM hw, there is no LFCLK timer tick

// Found this sequencing information in a post at
// https://devzone.nordicsemi.com/f/nordic-q-a/53362/how-to-check-nrf52840-external-low-frequency-crystal-32khz-is-connected-or-not
// and adapted it for our problem
//
// This sequence roughly follows what's described in the 5.4.2 LFCLK controller section of the product spec doc
//
// This fix is specific to our boards, which use an external 32.768 oscillator for the RTC/sysclock
//

void rsw_sysclock_driver_init_fix(void)
{
   #define LFCLKSRC_EXT_OSC   (3UL<<16 | CLOCK_LFCLKSRC_SRC_Xtal)    //RSW - 3<<16 is External + Bypass

   //RSW - if the clock source is already set properly, we're done here
   if (NRF_CLOCK->LFCLKSRC == LFCLKSRC_EXT_OSC)
      return;
   

   // RSW - they didn't explicitly stop the clock in the online post, but I think you
   // probably want to stop it before changing the source...?  The reference manual doesn't
   // say.  There is also no EVENTS_LFCLKSTOPPED flag.
   NRF_CLOCK->TASKS_LFCLKSTOP = 1;

   NRF_CLOCK->EVENTS_LFCLKSTARTED = 0;

   // Synchronize register writes to 16MHz AHB clock by reading the same register.
   // Protect against out of order execution by explicitly casting to volatile.
   (volatile void)NRF_CLOCK->EVENTS_LFCLKSTARTED;
   NRF_CLOCK->LFCLKSRC = LFCLKSRC_EXT_OSC;
   NRF_CLOCK->TASKS_LFCLKSTART = 1;


   // RSW - original post's code checked for EVENTS_LFCLKSTARTED for
   // "so long", then if that doesn't work, tries to start the clock over and
   // over again.  I *think* it's probably safer (at least for now) to start
   // the clock once and wait "forever" for it to run
   // We'd sort of like a timeout, but there's no timebase because we are
   // setting up the RTC, which is the timebase
   while (!NRF_CLOCK->EVENTS_LFCLKSTARTED)
      ;
}



int sys_clock_driver_init(const struct device *dev)
{
    ARG_UNUSED(dev);
    static const enum nrf_lfclk_start_mode mode =
        IS_ENABLED(CONFIG_SYSTEM_CLOCK_NO_WAIT) ?
            CLOCK_CONTROL_NRF_LF_START_NOWAIT :
            (IS_ENABLED(CONFIG_SYSTEM_CLOCK_WAIT_FOR_AVAILABILITY) ?
            CLOCK_CONTROL_NRF_LF_START_AVAILABLE :
            CLOCK_CONTROL_NRF_LF_START_STABLE);

    /* TODO: replace with counter driver to access RTC */
    nrf_rtc_prescaler_set(RTC, 0);
    for (int32_t chan = 0; chan < CHAN_COUNT; chan++) {
        nrf_rtc_int_enable(RTC, RTC_CHANNEL_INT_MASK(chan));
    }

    NVIC_ClearPendingIRQ(RTC_IRQn);

    IRQ_CONNECT(RTC_IRQn, DT_IRQ(DT_NODELABEL(RTC_LABEL), priority),
            rtc_nrf_isr, 0, 0);
    irq_enable(RTC_IRQn);

    nrf_rtc_task_trigger(RTC, NRF_RTC_TASK_CLEAR);
    nrf_rtc_task_trigger(RTC, NRF_RTC_TASK_START);

    int_mask = BIT_MASK(CHAN_COUNT);
    if (CONFIG_NRF_RTC_TIMER_USER_CHAN_COUNT) {
        alloc_mask = BIT_MASK(EXT_CHAN_COUNT) << 1;
    }

    if (!IS_ENABLED(CONFIG_TICKLESS_KERNEL)) {
        compare_set(0, counter() + CYC_PER_TICK,
                sys_clock_timeout_handler, NULL);
    }

    z_nrf_clock_control_lf_on(mode);

    if (IS_ENABLED(CONFIG_RSW_SYSCLOCK_DRIVER_INIT_FIX)) {
         rsw_sysclock_driver_init_fix();
    }


    return 0;
}

Parents
  • rossquatch,

    There are couple of debugging directions I can take from here but I would need your help to validate few things before I start to modify the DK I have on my desk.

    1. External clock temperature coefficient with temperature. 
      I would like to remove the external clock issues from the picture. You said that when you freeze the chip (spray) then you are not effecting the external lfclk? I mean that the external osc is far enough in your board to be not effected by the temperature? Or I can rephrase this better, Are you sure that when the issue is happening then you have a valid stable clock input coming from the external osc? 

    2. K_Sleep being stuck
      We have seen some issues with some RTC tick accuracy with temperature but did not see an issue with the freeze completely. Saying that, I think these accuracy tests we have done on other issues were done only with internal RC or a XTAL for LFCLK. So it is a bit of a new area for me to explore this issues with external OSC (not XTAL).

    3. LFCLKSTAT register

      The default values of the LFCLKSTAT register at boot time will always show that RC is running until the external clock is stable. We need to rule out that at low temperatures we still get the stable clock.

    On my end, the default setup will be nRF52840 DK and that have an external crystal for the LFCLK. We have recently done some tests in cold temperature (-30 to -32 C) in the temperature chamber that there is no issue with the RTC tick being stalled but there was some issue with little drift. I do not want to do the same experiment again just to sniff the LFCLKSTAT unless I know that this is necessary to do.

Reply
  • rossquatch,

    There are couple of debugging directions I can take from here but I would need your help to validate few things before I start to modify the DK I have on my desk.

    1. External clock temperature coefficient with temperature. 
      I would like to remove the external clock issues from the picture. You said that when you freeze the chip (spray) then you are not effecting the external lfclk? I mean that the external osc is far enough in your board to be not effected by the temperature? Or I can rephrase this better, Are you sure that when the issue is happening then you have a valid stable clock input coming from the external osc? 

    2. K_Sleep being stuck
      We have seen some issues with some RTC tick accuracy with temperature but did not see an issue with the freeze completely. Saying that, I think these accuracy tests we have done on other issues were done only with internal RC or a XTAL for LFCLK. So it is a bit of a new area for me to explore this issues with external OSC (not XTAL).

    3. LFCLKSTAT register

      The default values of the LFCLKSTAT register at boot time will always show that RC is running until the external clock is stable. We need to rule out that at low temperatures we still get the stable clock.

    On my end, the default setup will be nRF52840 DK and that have an external crystal for the LFCLK. We have recently done some tests in cold temperature (-30 to -32 C) in the temperature chamber that there is no issue with the RTC tick being stalled but there was some issue with little drift. I do not want to do the same experiment again just to sniff the LFCLKSTAT unless I know that this is necessary to do.

Children
No Data
Related