Weird LFCLK Behavior With nRF Connect SDK v2.6.1 On nRF52805 Without External LFCLK Oscillator

I ran across this issue while troubleshooting a separate issue, in which I have an nRF52810-based board with no external LFCLK oscillator that is rock-solid with nRF5 SDK but is behaving weirdly with nRF Connect SDK, and I suspected this was due to having misconfigured the SDK to specify the use of the external LFCLK oscillator . While looking at this, I tried the same code and configuration on an nRF52805-based board which works fine with the nRF Connect SDK, but which I think also lacks an external LFCLK oscillator (for reasons surpassing understanding, I haven't been given a schematic for this nRF52805-based board).

Here's what's weird: with the nRF Connect SDK LFCLK configuration like this:

CONFIG_CLOCK_CONTROL_NRF_K32SRC_XTAL=y
CONFIG_CLOCK_CONTROL_NRF_K32SRC_50PPM=y

I see the following on the nRF52805 LFCLK regs when I check them at the top of main():

LFCLKSRC 0x00000001 LFCLKSTAT 0x00010001 LFCLKRUN 0x00000001 CTIV 0x00000000

I understand why LFCLKSRC is 1, but if there's no external LFCLK oscillator on this nRF52805 board, how on Earth is LFCLKSTAT is 0x00010001? It seems like that should only be possible of there is an external LFCLK oscillator, right?

But the problem is that, if I check this with nRF5 SDK firmware on the same board, it consistently shows there is is no external LFCLK oscillator. For instance, if I configure the nRF5 SDK using:

#define NRF_SDH_CLOCK_LF_SRC 0

then everything works fine (SoftDevice initializes without problem, etc). But if I configure the nRF5 SDK using:

#define NRF_SDH_CLOCK_LF_SRC 1

I get an error 7 (NRF_ERROR_INVALID_PARAM) returned from nrf_sdh_enable_request(). And if, before I make that call, I manually configure and enable the LFCLK on the nRF52805 using this code from Errata 3.2 [20], it never exits the loop checking EVENTS_LFCLKSTARTED:

static void lfclock_init(void)
{

NRF_CLOCK->EVENTS_LFCLKSTARTED = 0;
NRF_CLOCK->LFCLKSRC = 1;
NRF_CLOCK->TASKS_LFCLKSTART = 1;

while (NRF_CLOCK->EVENTS_LFCLKSTARTED == 0)
{
}

NRF_RTC0->TASKS_STOP = 0;
NRF_RTC1->TASKS_STOP = 0;

}

Finally, if I replace the wait loop in lfclock_init() above with an nrf_delay_us(500000),and then check the LFCLK registers, they are set exactly like you'd expect if there was no external LFCLK oscillator (they show the LFCLK is running, but not from the external oscillator):

LFCLKSRC 0x00000001 LFCLKSTAT 0x00010000 LFCLKRUN 0x00000001 CTIV 0x00000000

So that's my question. Why does nRF5 SDK and the LFCLK registers themselves tell me it is impossible to configure an external LFCLK oscillator on this nRF52805-based board, but when I run nRF Connect SDK firmware configured for external LFCLK oscillator on the same board, LFCLKSTAT is 0x00010001, which clearly indicates that it is running on an external LFCLK oscillator?

What am I missing here?

Thank you!

Parents
  • Yes, you are right, LFCLKSTAT register should not be giving 0x00010001 if there is not external crystal. If you are absolutely sure that there is no XTAL, then I can try to replicate this with minimalistic code in nrf connect sdk. Can you please give me the laser marking on your chip so that I get the build code for it and understand which batch of production those are? I can then try to do a test on something with the batch close to that.

  • Susheel, I wanted to provide a few more pieces of information to close the loop on this from my side. As I mentioned in my original question, this question about the nRF52805 LFCLK behavior came up when I was debugging some very weird behavior with a custom nRF52810 board, which I thought might stem from the fact that this nRF52810 board also lacked an external LFCLK oscillator but was configured in Zephyr to use one.

    I just got that nRF52810 board back so I could finish testing and found some things I think Nordic really needs to explore.

    1) When I create a build configuration for the nRF52810 that indicates external LFCLK oscillator should be used, and then output the LFCLK regs at the top of the main() when running under Zephyr, I get the same invalid value on LFCLKSTAT (0x00010001) on our nRF52810 board that I reported on the nRF52805! I think that proves that this issue is not tied to a specific batch of nRF52805 processors, but stems from something common to all of these related processor designs.

    2) Unlike the nRF52805 board I was using, I do have schematics for this custom nRF52810 board, and it _definitely_ does not have an external LFCLK oscillator, but we're still seeing LFCLKSTAT = 0x00010001.

    3) This invalid LFCLKSTAT value — which indicates that the LFCLK is running on an external LFCLK oscillator even though the board lacks one — is arising inside the MPSL LFCLK initialization code. Before that code is called, LFCLKSTAT is zero, after it is called, it is 0x00010001. Of course, since I don't think we have access to the MPSL source, I don't have a way to determine why this is happening on these MCUs.

    4) The weird behaviors I was seeing on our custom nRF52810 board are _definitely_ the result of this LFCLK misconfiguration, which means that Zephyr is _not_ gracefully falling back to the internal RC oscillator when it tries to use the external LFCLK oscillator but it is unable to get an LFCLKSTARTED event (again, if this was what what was happening, LFCLKSTAT should be 0x00010000). I think this strongly suggests that the invalid LFCLKSTAT = 0x00010001 we're seeing really does indicate an invalid hardware state that is affecting MCU behavior.

    For instance, in our firmware — which is pretty similar to the LED_Button sample —  I'm seeing the board consistently hang when I call settings_subsys_init(), when nrf_flash_sync_exe() tries to take the timeout_sem in line 205 of flash_sync_mpsl.c. If I remove the initialization of the Zephyr settings modules, I get to the point where I start BLE advertising sucessfully, but there's no evidence of advertising packets  being sent from the nRF52810.

    However, if I make these change to my prj.conf file, our firmware works fine on our nRF52810 board:

    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_500PPM=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_XTAL=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_50PPM=n

    You should be able to reproduce this yourself: I cloned the LED_Button sample for Zephyr, and created a build configuration for the nRFDK_nRF52810 using the prj_minimal.conf file and the Debug optimizations. I then commented out the dk_leds_init() and init_button() calls but to avoid GPIO conflicts on our nRF52810 board but left everything else the same, including the configuration for using the external LFCLK oscillator.

    When I run this LBS sample on my nRF52810 board, it consistently hangs in bt_enable(NULL). However, if I switch to the LFCLK RC oscillator in  prj_minimal.conf file by adding the lines above, the LED_Button sample works fine on our nRF52810:

  • Nathan,

    Thanks for all that info. It seems very possible that there is either some bug in MPSL initialization code or something wrong with the hardware stat register. Either way I need to recreate this issue on my desk before I can poke the hardware engineers. Since you are so confident that this issue lies on the nRF hardware side and since you have done quite a bit of debugging attempts yourself, I will try to replicate this on my end. I need to modify the DK I have to remove the XTAL. It might take some time. Meanwhile, assume that this is a hardware bug and try to have a temporary workaround for this (by adding delay instead of relying on the LFCLK stat register) so that you can proceed your development for now. 

    I will dedicate some time to debug this at my end most likely early next week.

  • Susheel,

    Thank you for the update. So I'm in good shape at this point, since I've identified that the weird behavior on our nRF52810 was due to this LFCLK configuration mismatch, and our board is working well now that I've fixed it. I'll leave this ticket open so you can report what you found, or if you have questions fo rme, but I just wanted to be clear that this is no longer urgent for our project, but obviously is something that should be hunted down and resolved.

    One thing that I think is important is that this LFCLK fallback be documented for SDK users, because it certainly wasn't clear to me how the SDK was handling this, but also because if there's a manufacturing issue that is preventing XTAL from working, simply falling back to RC may not be the correct behavior for every product (e.g. for products that need a high degree of RTC accuracy).

    As a result, I think it's important to have a way for SDK users to either be able to control application of the fallback _or_ at least to be able to detect that it has happened (I guess I'm doubting most devs even know to look for it at this point). Once the fallback is actually configuring the LFCLK to use RC, LFCLKSTAT should be 0x00010000 if the fallback was used, and an SDK user could check that to see if an expected XTAL was present. Is there an SDK function that provides LFCLK status?

    One other thing related to the automatic fallback: does the MPSL automatically change the accuracy from whatever is specified for the XTAL to 500 PPM for the RC when the fallback occurs? It seems like it would need to, right? Again, I think that should be documented as well.

  • sorry for the late reply Nathan, And thanks for sharing that with me.

    ntennies said:
    As a result, I think it's important to have a way for SDK users to either be able to control application of the fallback _or_ at least to be able to detect that it has happened (I guess I'm doubting most devs even know to look for it at this point). Once the fallback is actually configuring the LFCLK to use RC, LFCLKSTAT should be 0x00010000 if the fallback was used, and an SDK user could check that to see if an expected XTAL was present. Is there an SDK function that provides LFCLK status?

    I am 100% with you, if there is infact the difference in how this is handled, this should be documented from our end. We are a bit short of 52810 DK and hence resisting a bit to make changes to it on the hardware to cutoff the XTAL to test this. I will have to make a timeboxed testing on this, and will report my findings to the devteam and also here.

    ntennies said:
    One other thing related to the automatic fallback: does the MPSL automatically change the accuracy from whatever is specified for the XTAL to 500 PPM for the RC when the fallback occurs? It seems like it would need to, right? Again, I think that should be documented as well.

    I am not 100% sure that it does, that is also something I need to dive into the code and see. 

Reply
  • sorry for the late reply Nathan, And thanks for sharing that with me.

    ntennies said:
    As a result, I think it's important to have a way for SDK users to either be able to control application of the fallback _or_ at least to be able to detect that it has happened (I guess I'm doubting most devs even know to look for it at this point). Once the fallback is actually configuring the LFCLK to use RC, LFCLKSTAT should be 0x00010000 if the fallback was used, and an SDK user could check that to see if an expected XTAL was present. Is there an SDK function that provides LFCLK status?

    I am 100% with you, if there is infact the difference in how this is handled, this should be documented from our end. We are a bit short of 52810 DK and hence resisting a bit to make changes to it on the hardware to cutoff the XTAL to test this. I will have to make a timeboxed testing on this, and will report my findings to the devteam and also here.

    ntennies said:
    One other thing related to the automatic fallback: does the MPSL automatically change the accuracy from whatever is specified for the XTAL to 500 PPM for the RC when the fallback occurs? It seems like it would need to, right? Again, I think that should be documented as well.

    I am not 100% sure that it does, that is also something I need to dive into the code and see. 

Children
No Data
Related