GRTC peripheral not always emitting scheduled events

I'm trying to use the new GRTC peripheral in nRF54L15 but in various situations I notice that the events are not triggered as expected.

Here is a "minimal reproducible example", based on the "Create a blank application" template in VS Code with NCS 2.8.0.

main.c:

#include <zephyr/kernel.h>

int main(void)
{
    // Turn off LED1 and LED3 on the devkit
    NRF_P1->PIN_CNF[10] = (GPIO_PIN_CNF_DIR_Output << GPIO_PIN_CNF_DIR_Pos) | (GPIO_PIN_CNF_INPUT_Disconnect << GPIO_PIN_CNF_INPUT_Pos);
    NRF_P1->PIN_CNF[14] = (GPIO_PIN_CNF_DIR_Output << GPIO_PIN_CNF_DIR_Pos) | (GPIO_PIN_CNF_INPUT_Disconnect << GPIO_PIN_CNF_INPUT_Pos);
    NRF_P1->OUTCLR = (1U << 10) | (1U << 14);

    // Reset GRTC to a known state and start
    NRF_GRTC->TASKS_STOP = 1;
    for (volatile int i = 0; i < 200; i++) {}
    NRF_GRTC->MODE = (GRTC_MODE_AUTOEN_Default << GRTC_MODE_AUTOEN_Pos) | (GRTC_MODE_SYSCOUNTEREN_Disabled << GRTC_MODE_SYSCOUNTEREN_Pos);
    NRF_GRTC->TIMEOUT = 0;
    NRF_GRTC->WAKETIME = 3;
    NRF_GRTC->SHORTS = 0;
    NRF_GRTC->PWMCONFIG = GRTC_PWMCONFIG_ResetValue;
    NRF_GRTC->CLKOUT = GRTC_CLKOUT_ResetValue;
    NRF_GRTC->CLKCFG = (GRTC_CLKCFG_CLKSEL_SystemLFCLK << GRTC_CLKCFG_CLKSEL_Pos) | (1 << GRTC_CLKCFG_CLKFASTDIV_Pos);
    NRF_GRTC->MODE = (GRTC_MODE_AUTOEN_Default << GRTC_MODE_AUTOEN_Pos) | (GRTC_MODE_SYSCOUNTEREN_Enabled << GRTC_MODE_SYSCOUNTEREN_Pos);
    NRF_GRTC->TASKS_CLEAR = 1;
    for (volatile int i = 0; i < 200; i++) {}
    for (int i = 0; i < 4; i++) NRF_GRTC->SYSCOUNTER[i].ACTIVE = 0;
    for (int i = 0; i < 12; i++) NRF_GRTC->CC[i].CCEN = 0;
    NRF_GRTC->TASKS_START = 1;

    // Delay ~3 LFCLK cycles
    NRF_TIMER20->TASKS_STOP = 1;
    NRF_TIMER20->TASKS_CLEAR = 1;
    NRF_TIMER20->PRESCALER = 4;
    NRF_TIMER20->EVENTS_COMPARE[0] = 0;
    NRF_TIMER20->CC[0] = 93;
    NRF_TIMER20->TASKS_START = 1;
    while (!NRF_TIMER20->EVENTS_COMPARE[0]) {
    }
    NRF_TIMER20->TASKS_STOP = 1;
    NRF_TIMER20->TASKS_CLEAR = 1;

    // These lines are not necessary to reproduce the issue
    for (;;) {
        (void)NRF_GRTC->SYSCOUNTER[0].SYSCOUNTERL;
        if (!(NRF_GRTC->SYSCOUNTER[0].SYSCOUNTERH & GRTC_SYSCOUNTER_SYSCOUNTERH_BUSY_Msk)) {
            break;
        }
    }

    // Set up to trigger a one-shot event when syscounter >= CC[0]
    NRF_GRTC->EVENTS_COMPARE[0] = 0;
    NRF_GRTC->CC[0].CCL = 5000;
    NRF_GRTC->CC[0].CCH = 0; // This write also enables the CC[0] event

    uint32_t cnt = 0;
    while (!NRF_GRTC->EVENTS_COMPARE[0]) {
        if (++cnt == 1000000) {
            // Turn on LED3
            // Often gets stuck here
            NRF_P1->OUTSET = 1U << 14;
        }
    }

    // Turn on LED1
    // Success case, happens very seldom
    NRF_P1->OUTSET = 1U << 10;
    return 0;
}

prj.conf:

CONFIG_SERIAL=n

The build settings use "nrf54l15dk/nrf54l15/cpuapp", "Nordic Kits" and the rest uses the default configuration. Flash the board and then press the reset button on the devkit and see which of the two LEDs of LED1 and LED3 that lights up. Press the reset button multiple times to repeat the test many times and notice the failure rate.

In this program, I configure a compare event to be triggered after 5 milliseconds. However, the event is never triggered, so the LED3 on the devkit turns on after a short timeout. The expected outcome is that the event is triggered so that LED1 lights up. In this particular example, the failure frequency is 100% on my new nRF54L15-PDK 0.8.1 but around 70% on my older devkit 0.7.0. Not sure if the difference is due to difference versions or if it is individual chip differences. It seems pretty random when the success case happens and when the failure case happens.

I've tried to read through the GRTC section in the data sheet multiple times to see if there is something I have missed, but have not found anything.

Could there be something important in the data sheet that is missing regarding configuration or required register write sequence, or is this a hardware bug?

Note that how easy it is to trigger the issue depends on e.g. the WAKETIME/TIMEOUT register settings. I have managed to reproduce the problem with a non-zero TIMEOUT and/or a smaller WAKETIME but that requires slightly more code. This is more or less the most simple example I could come up with.

Note that if I at the same time as I turn on LED3 due to timeout also implicitly force the GRTC into active state, e.g. by reading SYSCOUNTERL, then the event triggers at that point.

Note that it appears the problem cannot be reproduced with CONFIG_SERIAL=y.

I have also tried to run the same code in a "bare metal" project without Zephyr but the same issue is observed.

I'm attaching the project so you can test it directly. I guess you can flash the merged.hex file in the build directory in case the project cannot be built in VS Code for some reason.

grtc_test.tar.gz

  • Does that imply that TIMEOUT cannot be 0?

    FYI the reset value for TIMEOUT is 0 but for WAKETIME it is 1.

  • Hi Emil,

    I am sorry to say but this looks to be an issue with reset values also. We need to fix this. 

    Kenneth

  • I think that was an important missing piece! Whenever I set TIMEOUT bigger than WAKETIME, I can no longer reproduce the issue no matter what I try. I do however believe this is a workaround for a hardware bug that should at least be mentioned in the errata or data sheet. Let me explain more why I believe this is the case.

    Here is a more accurate model on how the GRTC works than stated in the data sheet according to my experiments.

    Assuming the GRTC is running and the SYSCOUNTER is enabled, the SYSCOUNTER can be in three different states: active state, sleeping state, wakeup requested state.

    Whenever in active state, at each SYSCOUNTER tick the active CC[n] registers are checked to see if anyone is <= the current SYSCOUNTER value. If true, the corresponding event is emitted.

    Reading the SYSCOUNTER value while not in the active state will return the previously known value from when the SYSCOUNTER was active the last time, with the Busy flag set to true. Triggering a CAPTURE task in non-active state will copy the old SYSCOUNTER value as well.

    The data sheet contains a list of various conditions that forces the SYSCOUNTER in active state. To this list, add "a compare event for some CC[i] is triggered" since this is missing. The SYSCOUNTER keeps track of at what LFCLK tick number any one of these conditions were last met (call this variable LatestActive). If the SYSCOUNTER is in the sleeping state whenever any of these conditions are met, it will be moved to the wakeup requested state. While in wakeup requested state, it will move to the active state at the next LFCLK tick. At that tick, the LatestActive variable will be incremented by one to the current tick number.

    Whenever the SYSCOUNTER is in the active state, it will always remain in this state until at least the next LFCLK tick (disabling the SYSCOUNTER however immediately brings it to the sleeping state). At the next LFCLK tick after every LFCLK period the SYSCOUNTER was in active state, any of the following conditions will keep the SYSCOUNTER active during that following LFCLK period, otherwise it will go to sleep:
    1. At least one of SYSCOUNTER[n].ACTIVE or CpuActive are set.
    2. Current LFCLK tick number - LatestActive != TIMEOUT + 1.
    3. SYSCOUNTER value µs + (TIMEOUT + 1) * 32 µs >= the earliest active CC[n] value. (Using 32 here instead of ~30.517 is mostly an optimisation since that could be implemented using a simple bit shift.)

    If it does go to sleep and there is an active CC[n] compare value, the internal low-frequency compare register will be programmed so that the wakeup to active state occurs at LFCLK clock cycle number ceil(cc-WAKETIME), where cc is the rational number CC[n]/(10^6/32768). So, with WAKETIME set to 0, the GRTC is woken up never before but at most 31 µs later than the scheduled event, which indicates that WAKETIME=0 should not be used. With WAKETIME set to 1 however, it should always wake up in time, i.e. > 0 LFCLK periods and <= 1 LFCLK periods before the scheduled time.

    With the above logic, it is in some cases (whenever TIMEOUT < WAKETIME) possible to come into the state where the programmed time to wake up is either in the past or too near in the future. In this case the SYSCOUNTER will fail to wake up to handle the scheduled events. It appears that when it goes to sleep, the scheduled wake up time must be at least 2 LFCLK periods in the future. 1 LFCLK period in the future sometimes work, sometimes not.

    The hardware bug lies in condition 3 above when checking if the SYSCOUNTER should stay active due to a near scheduled event. TIMEOUT there should instead be WAKETIME since this condition checks that if the scheduled event is in the near future, we should stay active since the scheduled wakeup time would already have occurred (or will occur too near in the future). So a proper workaround for this hardware bug would be to make sure TIMEOUT >= WAKETIME. Kenneth, you mention that TIMEOUT > WAKETIME (bigger, rather than bigger or equal), but can you please confirm that TIMEOUT >= WAKETIME should work fine as well? In any case, I hope this can be fixed in a future SoC revision.

    The TIMEOUT register documentation states "Timeout after all CPUs gone into sleep state to stop the SYSCOUNTER" but this seems totally wrong and therefore misleading. In my tests I always keep the CPU active by busy looping or similar, so if this was true, I should see no difference in GRTC behaviour if I modify the TIMEOUT value, but alas I do. According to the model above, I think a much more correct text would be "When the active condition has not been met after this many 32Ki cycles, the SYSCOUNTER goes to sleep".

    Note that with the above model and due to condition 2 above, modifying TIMEOUT to a smaller value while the timeout cool-down is in progress can result in that the SYSCOUNTER will fail to enter sleep since the new programmed time to go to sleep will be in the past. This has been verified to indeed be the case on the hardware.


    Now to some other errors or issues I found in the data sheet:

    The "Sample code for reading the SYSCOUNTER value" in the data sheet also seems buggy. The line "syscounterh = syscounterh_value - 1;" should probably be "syscounterh_value = syscounterh_value - 1;" and "(syscounterh_value << 32)" must instead be "((uint64_t)syscounterh_value << 32)" in order to not cause overflow in this subexpression, since syscounterh_value is a 32 bit number. With these fixes, this routine appears to work as intended. I thought there could be a problem in case the GRTC was busy while reading the low 32 bits but got non-busy just before reading the higher bits (and would then return non-busy, but the low bits would contain an old value), but it appears the peripheral remembers that the read of the lower bits was in the busy state, so when reading the high part, it will return busy, which solves that issue, forcing the whole pair to be needed to read again. What I find strange though is that this algorithm does not correspond to the "Recommendation on reading SYSCOUNTER" section. That section does not seem to handle the overflow case. Also, in step 2 it reads the high register without previously reading the low register, which the data sheet in other places says is not allowed. So that section should be rewritten so that it matches the example code in my opinion.

    The data sheet says "The CC[n].CCEN.ACTIVE must be enabled in order to use the corresponding SYSCOUNTER compare and capture channel", but this seems only true for the compare part, i.e. when you want the EVENTS_COMPARE event. It seems ok to trigger the TASKS_CAPTURE tasks even with the corresponding CC disabled.

    The data sheet says "Writes to CC[n].CCADD are ignored when the SYSCOUNTER is in sleep state". This seems totally wrong. Adding using CC as reference works perfectly fine even in sleep state. When instead adding using SYSCOUNTER as reference in sleep state, the write is not ignored but it performs an addition using the latest known SYSCOUNTER, i.e. with the same value as a read of SYSCOUNTERL/SYSCOUNTERH would return in the lower 52 bits.

    The data sheet says "If the CC[n] overflows after writing to CC[n].CCADD.VALUE, then EVENTS_COMPARE[n] is generated immediately". This is also wrong. Instead the resulting CC value will be the addition modulo 2^52 and from here everything will be as usual. If this wrapped value happens to be <= the current SYSCOUNTER value however, then the event will of course trigger immediately.

    In general the data sheet should point out that whenever it says that something can only be done in active mode, it should be clear that if the user puts the GRTC into active mode by any allowed way from sleept, it does not really get into active mode immediately but that can take up to 31 µs (or even more just after the GRTC is started). In my model above I include an additional wakeup requested state which solves this documentation issue. For example, the data sheet says "the TASKS_CAPTURE[n] is functional only when the SYSCOUNTER is in active state. The GRTC can be forced into active state by setting any SYSCOUNTER[n].ACTIVE register.", so it appears that a code sequence of "SYSCOUNTER[n].ACTIVE = 1; TASKS_CAPTURE[m] = 1;" would work but in reality it doesn't because there is a missing wait in the middle until it has become active. The capture functionality to me in general seems pretty pointless since it only works in active mode. The old RTC peripheral could capture at any time, as well as it had the ability to read the COUNTER from the CPU in maximum 6 16 MHz cycles. It would be nice to have a functionality which captured the internal low-frequency counter converted to µs instead.

    When I use the LFXO clock source directly instead of the SystemLFCLK with System clock source = LFXO, I'm a bit confused if I first should use the normal sequence of initialising the System LFCLK anyway, i.e. configure LFCLK.START with LFXO, triggering the LFCLKSTART and then wait for LFCLKSTARTED event, or if this can be skipped and the GRTC will handle starting up the LFXO internally.

    It appears that CLKSEL for GRTC must be set to LFXO in order to be able to wake up from SYSTEM OFF. I tried to set CLKSEL to SystemLFCLK instead and then make sure the System LFCLK uses LFXO as clock source, but was surprised this didn't work (the system never woke up). I think this information is missing in the data sheet.

    The data sheet says "All GRTC registers are reset during wakeup from System OFF mode, but the clock source selection at GRTC is retained internally.", but I think it is important to additionally mention that the internal counter will not be cleared and reset but keep running when waking up from System OFF. That is, there is no need to trigger the START task again and the internal counter will maintain its value. The introduction on one hand says "It will continue to be updated in all power modes.", but when reading "All GRTC registers are reset during wakeup from System OFF mode" I was pretty sure that also meant the GRTC was stopped and the internal counter register cleared.

    When performing a soft reset like NVIC_SystemReset(), it appears that all registers are retained and the counter will keep running (assuming it was running before the reset). I don't see that information anywhere in the data sheet. According to the reset behaviour table all peripherals will be reset at a soft reset, except where explicitly stated otherwise. So please add this info if this is true.

    Some other smaller issues I found in the data sheet:
    There is a text "Width of the RTCOUNTERH, RTCOMPAREH and RTCOMPARESYNCH registers : 0..14", but there are no such documented registers. Maybe these are for the internal low-frequency counter that are not accessible?
    Under SHORTS, there is a short RTCOMPARE_CLEAR which is a "Shortcut between event RTCOMPARE and task CLEAR", but there is no event RTCOMPARE. However, there is a "RTCOMPARESYNC" event. Is this what was intended? In that case, it's a typo. Or is it the internal compare event? In any case, I have no idea what the purpose of this short is and it is not explained either. It seems very weird.

    Another thing I think should be documented are the huge latencies of the START/STOP/CLEAR tasks, similar to how the documentation looks like for the RTC peripheral with very precise timing diagrams. Also how these tasks interact with the SYSCOUNTER (which appears to very separated from the internal low-frequency counter). For example, I was a bit surprised that the SYSCOUNTER could keep running and counting as long as it is in active state, even if the stop task has been triggered. I found out that assuming the internal counter is running, it seems the CLEAR tasks takes between 1 and 2 LFCLK cycles before it takes effect and that the SYSCOUNTER must after this point in time be put to sleep (if not already asleep) and then woken up in order to synchronise the new values. The START/STOP tasks seem to take effect at the next LFCLK tick and if both are triggered during the same LFCLK period, only the task that results in a toggle of the state will take effect. For a proper reset of the GRTC peripheral, the following sequence seems to work fine (assuming the START task was not triggered less than 1 LFCLK cycle ago): disable the SYSCOUNTER, trigger STOP, trigger CLEAR, delay 62 µs. Then the desired configuration can be applied.

    I found a bug in the SDK here https://github.com/NordicSemiconductor/nrfx/blob/85c444ee0b76272d8a074b82845f04e92b3253c4/hal/nrf_grtc.h#L1595 and https://github.com/NordicSemiconductor/nrfx/blob/85c444ee0b76272d8a074b82845f04e92b3253c4/hal/nrf_grtc.h#L1615. Accessing a 64-bit integer in C does not require the compiler to access the individual 32-bit registers in a particular order, but the peripheral requires that the low part is read before the high part. See godbolt for an example where the higher part is read before the lower part.

    One last question I have is, is there any particular reason the default Zephyr configuration use as high values as WAKETIME=4 and TIMEOUT=5? It seems like a waste of energy to wake up as much as 122 µs early as well as keeping it active for additionally 152 µs. The values WAKETIME=1 and TIMEOUT=1 appears to work good for me at least.

    I hope this feedback is otherwise useful.

  • Hi Emil,

    Much appreciated, it will take some time to look into this, but I have forwarded and I will make sure the right stake holders follow it up.

    Quick comments:

    > I think that was an important missing piece! Whenever I set TIMEOUT bigger than WAKETIME, I can no longer reproduce the issue no matter what I try.

    Thanks for confirming!

    > Kenneth, you mention that TIMEOUT > WAKETIME (bigger, rather than bigger or equal), but can you please confirm that TIMEOUT >= WAKETIME should work fine as well? In any case, I hope this can be fixed in a future SoC revision.

    This is something that is being looked into. The short answer for now is that we need to add a text in the documentation that says: "GRTC.TIMEOUT must be larger than GRTC.WAKETIME, or events may be lost." But how much greater TIMEOUT needs to be is something they need to investigate, maybe it's sufficient with +1.

    > Now to some other errors or issues I found in the data sheet

    I have forwarded internally to ensure it will be looked into.

    Kenneth

Related