When frozen then reset, nRF52840 RC osc/RTC doesn't tick and LFCLKSRC, etc doesn't init correctly

Our product is based on the Nordic nRF52840, running a fork of the v2.7.1 branch of Zephyr, using an external 32.768Khz oscillator for the sysclock/LFCLK.    This works fine at room temperature, starting with the internal RC osc, then correctly automatically switching transparently to the external oscillator.

We cold temperature tested the device a few ways using a freezer and also freeze spray.   When the device is frozen then reset (via Segger Ozone.J-Link or a Zephyr reset function), on startup the sysclock/LFCLK is set to the internal RC source and is not ticking (or at least the ARM's sysclokc/RTC thinks it isn't ticking.)    Because of this, the first time  k_sleep() is called it hangs until the board warms up and the LFCLK/RTC resumes ticking.      Some of the state/source registers for the LFCLK also seem to be messed up and even after the LFCLK starts ticking, in this case the ARM hw never switches from the RC to  external oscillator.    The  nRF52840 module and external osc are rated to -40C, so this stuff should work during our tests.  

When the ARM's at room temp:

  • LFCLKSRC eventually runs off the 32.768Khz osc and works fine.
  • I get a dbg msg:  <dbg> clock_control.clkstarted_handle: lfclk: Clock started
  • The LFCLKSTAT register is $10000 before k_sleep, which means the LFXO oscillator is not running
    After k_sleep exits, LFCLKSTAT register is $10001, which means the LFXO oscillator is running

  • The LFCLK register is ticking (verified via counting ticks using Nordic's RTC driver)
  • k_sleeps work fine.

When the ARM's cold:

  • Both the application and the bootloader seem to be running the LFCLKSRC off the RC oscillator
  • I get a dbg msg:  <dbg> clock_control.clkstarted_handle: lfclk: Clock started
  • The LFCLKSTAT register is $10001 before sleep, which means the LFXO oscillator is the LFCLKSRC and is running (!)
    After sleep exits, LFCLKSTAT register is $10000, which means the LFXO oscillator is the LFCLKSRC and not running (!)

  • The LFCLK is not ticking (via counting ticks using Nordic's RTC driver)

  • k_sleep hangs until the ARM warms up   (the bootloader has no sleeps in it, that's why it doesn't hang)   The first k_sleep is in the application main() function

  • After the board warms, the LFCLK is ticking (verified via counting ticks using Nordic's RTC driver)

I've attempted to fix the issue by manually switching LFCLKSRC during sysclock initialization.  I call my code right at the end of the existing sys_clock_driver_init function in nrf_rtc_timer.c

This solution appears to work when the board is frozen.  It's based on information in the nRF52840 product specification and internet posts.   Any comments on if this is the proper way to manually switch the LFCLKSRC?   Could you suggest how I might make the switch better, more robust, etc?   Are there erratas for this? (I did a quick search, but didn't find anything.)

Thanks!
Ross

// RSW - When the ARM is cold, then reset by Zephyr reset function, debugger reset, etc it will
// not switch the LFCLK source properly between the internal RC and external oscillator.    The RC osc
// stays selected, but does not increment the timer count register (RC may or may not be oscillating)
// This causes k_sleeps to hang, because at least according to the ARM hw, there is no LFCLK timer tick

// Found this sequencing information in a post at
// https://devzone.nordicsemi.com/f/nordic-q-a/53362/how-to-check-nrf52840-external-low-frequency-crystal-32khz-is-connected-or-not
// and adapted it for our problem
//
// This sequence roughly follows what's described in the 5.4.2 LFCLK controller section of the product spec doc
//
// This fix is specific to our boards, which use an external 32.768 oscillator for the RTC/sysclock
//

void rsw_sysclock_driver_init_fix(void)
{
   #define LFCLKSRC_EXT_OSC   (3UL<<16 | CLOCK_LFCLKSRC_SRC_Xtal)    //RSW - 3<<16 is External + Bypass

   //RSW - if the clock source is already set properly, we're done here
   if (NRF_CLOCK->LFCLKSRC == LFCLKSRC_EXT_OSC)
      return;
   

   // RSW - they didn't explicitly stop the clock in the online post, but I think you
   // probably want to stop it before changing the source...?  The reference manual doesn't
   // say.  There is also no EVENTS_LFCLKSTOPPED flag.
   NRF_CLOCK->TASKS_LFCLKSTOP = 1;

   NRF_CLOCK->EVENTS_LFCLKSTARTED = 0;

   // Synchronize register writes to 16MHz AHB clock by reading the same register.
   // Protect against out of order execution by explicitly casting to volatile.
   (volatile void)NRF_CLOCK->EVENTS_LFCLKSTARTED;
   NRF_CLOCK->LFCLKSRC = LFCLKSRC_EXT_OSC;
   NRF_CLOCK->TASKS_LFCLKSTART = 1;


   // RSW - original post's code checked for EVENTS_LFCLKSTARTED for
   // "so long", then if that doesn't work, tries to start the clock over and
   // over again.  I *think* it's probably safer (at least for now) to start
   // the clock once and wait "forever" for it to run
   // We'd sort of like a timeout, but there's no timebase because we are
   // setting up the RTC, which is the timebase
   while (!NRF_CLOCK->EVENTS_LFCLKSTARTED)
      ;
}



int sys_clock_driver_init(const struct device *dev)
{
    ARG_UNUSED(dev);
    static const enum nrf_lfclk_start_mode mode =
        IS_ENABLED(CONFIG_SYSTEM_CLOCK_NO_WAIT) ?
            CLOCK_CONTROL_NRF_LF_START_NOWAIT :
            (IS_ENABLED(CONFIG_SYSTEM_CLOCK_WAIT_FOR_AVAILABILITY) ?
            CLOCK_CONTROL_NRF_LF_START_AVAILABLE :
            CLOCK_CONTROL_NRF_LF_START_STABLE);

    /* TODO: replace with counter driver to access RTC */
    nrf_rtc_prescaler_set(RTC, 0);
    for (int32_t chan = 0; chan < CHAN_COUNT; chan++) {
        nrf_rtc_int_enable(RTC, RTC_CHANNEL_INT_MASK(chan));
    }

    NVIC_ClearPendingIRQ(RTC_IRQn);

    IRQ_CONNECT(RTC_IRQn, DT_IRQ(DT_NODELABEL(RTC_LABEL), priority),
            rtc_nrf_isr, 0, 0);
    irq_enable(RTC_IRQn);

    nrf_rtc_task_trigger(RTC, NRF_RTC_TASK_CLEAR);
    nrf_rtc_task_trigger(RTC, NRF_RTC_TASK_START);

    int_mask = BIT_MASK(CHAN_COUNT);
    if (CONFIG_NRF_RTC_TIMER_USER_CHAN_COUNT) {
        alloc_mask = BIT_MASK(EXT_CHAN_COUNT) << 1;
    }

    if (!IS_ENABLED(CONFIG_TICKLESS_KERNEL)) {
        compare_set(0, counter() + CYC_PER_TICK,
                sys_clock_timeout_handler, NULL);
    }

    z_nrf_clock_control_lf_on(mode);

    if (IS_ENABLED(CONFIG_RSW_SYSCLOCK_DRIVER_INIT_FIX)) {
         rsw_sysclock_driver_init_fix();
    }


    return 0;
}

  • The nRF52840 is on a BC840M Fanstel module --- I do not have access to the top of the nRF52840.   Please see my previous replies for links to Fanstels site and more information.

    The problem is with the LFCLKRC oscillator within the chip.   It happens with every device we tested.   We will not be sending you our devices as they're prototypes and we have a limited number.  I suggest you test your chips using some of Nordic's development boards.  If you read back through the replies I have already detailed how to reproduce the bug.  

    Also, why has this post been changed to "private?"   There is a code work-around at the top of the post, which will be useful to other people that are having the same problem.   I made the post public for that reason.   Please change the post back to "public."  

  • Hi Ross, 

    Unfortunately, we do not have Fanstel modules to test and verify your failure. Without the failing module, we are unable to verify if it is an nRF5240 issue or something else inside the module. 

    I would recommend that you first check with Fanstel on these observed failures and see if they can analyze the issue and comment on your proposed work-around. 

    Until we have understood the root-cause of the abnormal operation at cold temperature, we are unable to comment on your proposed work-around.

    A request has been made to make this case "public". Please note that it will be set back to "private" if Nordic will move forward with the device/module investigation.

    Sincerely, Jennifer

  • rossquatch, 

    We will do some temperature tests on that chip version with a DK in cold temperatures. You might be onto something here and we will do our end of investigation. Nevertheless, like Jennifer says, we will not do those tests on Fanstel module but on our DK to see if we can isolate the issue to be at our end.

  • The post should be public, because it contains work-around code for a specific problem other developers maybe be experiencing.

    You can buy modules direct from Fanstel ... they cost $8.50. 
    I provided a link above in a previous post where you can buy them.

    I suggest you analyze the failure using nRF52840s there, I don't think it's specific to Fanstel's module, the problem is with the RC oscillator and LFCLKSRC selection hardware inside the nRF52840.   I think this particular problem (which is quite specific) slipped through your testing.  


  • Thanks! I don't think the problem's specific to the Fanstel module.    I think you should see the same problem with nRF52840s that are mounted directly to the board, provided they are the same ones Fanstel uses (which is detailed in previous posts above.)

    The steps to reproduce the problem are also detailed in a previous post  To recap, the steps to reproduce are:   (make sure you reset the board the way I said... i.e. don't just cycle power.   Cycling power may also cause the issue, but I haven't tested that.)

    • Cool a running board to -20C   (we used a -18C freezer, or freeze spray with temp unknown)
    • Reset using either Segger Ozone/J-Link reset or the Zephyr reset function 
    • Don't call k_sleep in your test application, it will likely hang when frozen 
    • When application starts running, check LFCLKSRC (should be LFCLKSRC_EXT_OSC, but I get LFCLKSRC_RC)
    • Also check if clock register is ticking (I used the z_nrf_rtc_timer_get_ticks() function)   I see the tick count remain the same until the board warms.
    • After the board warms up, z_nrf_rtc_timer_get_ticks shows the clock is ticking again, but it is still using the LFCLKSRC_RC and some of the other status registers are wrong
Related