nRF54L15 GRTC unable to wake chip at low temperatures

Hi,

We're in the product validation stage of development with nRF54L15, part of the validation is temperature cycling, we noticed that devices seem to be "stuck" in sleep mode at temperatures under -20 deg C. The chips remain in the low power state forever (even after heating back up), and it seems the GRTC passed through the latest compare value without waking the chip properly.

I was able to reproduce this exact issue with an nrf54l15dk and the broadcaster sample from nrf connect sdk v3.3.0. Steps to reproduce:
  1. Compile bluetooth/broadcaster sample for nRF54l15dk/nrf54l15/cpuapp/ns and flash an nRF54L15dk
  2. Place the board in a freezer set to maximum cold, this typically will allow it to reach under -20 in about 10 minutes
  3. Notice the bluetooth advertising has stopped. (for debugging it may be helpful to change line 44 to  err = bt_le_adv_start(BT_LE_ADV_NCONN_IDENTITY, ad, ARRAY_SIZE(ad), NULL, 0); so the bluetooth address doesn't change).
    1. It may also be helpful to attach a PPK, as that also clearly shows the advertising current spikes stopped. I have plenty of screenshots of current behavior if helpful.




I then started a debug session, it seems the last GRTC compare value was passed by without waking the chip. Interestingly, it seems like upon reset the board typically re-enters this "stuck" state in less than 10 seconds (so long as it is cold).
(gdb) x/48wx 0x400e2520
0x400e2520:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2530:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2540:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2550:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2560:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2570:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2580:  0x6fdaef98 0x00000000      0x00000000      0x00000000
0x400e2590:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e25a0:  0x6fcd6042 0x00000000      0x00000000      0x00000000
0x400e25b0:  0x6fcbc956 0x00000000      0x00000000      0x00000000
0x400e25c0:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e25d0:  0x00000000 0x00000000      0x00000000      0x00000000
(gdb) x/16wx 0x400e2720
0x400e2720:  0x717467c5 0x00000000      0x00000000      0x00000000
0x400e2730:  0x71746fbc 0x00000000      0x00000000      0x00000000
0x400e2740:  0x7174781d 0x00000000      0x00000000      0x00000000
0x400e2750:  0x71747fc5 0x00000000      0x00000000      0x00000000




Interestingly, it seems the interrupt is pending in the NVIC, as well as the GRTC peripheral.
(gdb) x/wx 0xe000e11c
0xe000e11c:  0x00000028
(gdb)  x/wx 0xe000e21c
0xe000e21c:  0x00000028

(gdb) x/48wx 0x400e2300
0x400e2300:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2310:  0x00000040 0x00000040      0x00000040      0x00000040
0x400e2320:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2330:  0x00000500 0x00000500      0x00000500      0x00000100
0x400e2340:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2350:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2360:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2370:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2380:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2390:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e23a0:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e23b0:  0x00000000 0x00000000      0x00000000      0x00000000
(gdb) x/16wx 0x400e2100
0x400e2100:  0x00000000 0x00000000      0x00000000      0x00000000
0x400e2110:  0x00000000 0x00000000      0x00000001      0x00000000
0x400e2120:  0x00000001 0x00000001      0x00000000      0x00000000
0x400e2130:  0x00000000 0x00000000      0x00000000      0x00000000


We'd appreciate any help on this matter as this issue prevents the product from being viable.
Parents
  • Hi Duncan,

    Could you give me a few more details?

    1. What is the SoC revision that you are using?

    2. Do you use MPSL in the failing configuration?

    Could you try to test the same setup using a build where MPSL is enabled exactly as in the standard sample, without disabling or overriding the clock/MPSL-related configuration?

    -Priyanka


  • Hi Priyanka,

    1. Our product uses the BL54L15 rev 1, which I believe corresponds to nRF54L15 rev 1. The nRF54L15 devkits I was able to reproduce this issue on are rev 1 (QFAAB0) as well.

    2. I am using MPSL in the failed configuration, I've attached the autoconf.h file from the build directory which shows CONFIG_MPSL=y and CONFIG_CLOCK_CONTROL_MPSL=y. I've made no changes to the bluetooth/broadcaster sample, there shouldn't be anything overriding MPSL configs as far as I can tell.
    1638.autoconf.h

  • The GRTC has both a low-frequency clock and a high-frequency clock; the second only being running while in active mode. See  RE: GRTC Periodic interval + nrfx for my previous post. When you put it into a freezer and don't have HFINT calibration activated, as done by MSPL, the high-frequency clock will desynchronize if the GRTC is runinng in active mode continuously for some time because the synchronization mechanism that otherwise runs every LFCLK tick, and keeps it synchronized with the low-frequency clock, only works if HFINT is not way too off. I guess theoretically, this could mean that when the GRTC goes to idle (i.e., turns off the high-frequency clock and schedules the internal CC register of the low-frequency clock when to wake up next time), it could schedule at a low-frequency tick that has already passed. But this requires that the GRTC has been active for not only a brief moment at a time, so that the desynchronization can actually make a difference, and that you then have some event scheduled not too far away in the future.

  • I've spent some time looking into this and I suspect your explanation of the root cause is correct. A couple things to note however.

    - Forcing the GRTC active prevents the freeze. If I explicitly call nrfx_grtc_active_request_set(true) in my ns build, the freeze at -20°C never happens. This seems to be consistent with Emil's explanation.

    - Forcing the HFXO to stay active also prevents the issue. This is also consistent with Emil's explanation.

    - Building for cpuapp/ns for the current SDK (v3.3.0) will always show this issue. It seems that the secure image variant (cpuapp) links against a version of MPSL that has the HFINT calibration workaround enabled (this is mentioned in the errata). however, the ns version of MPSL doesn't have this workaround. It's also not possible to apply the CLOCK_CONFIG_NRF_HFINT_CALBIRATION kconfigs to a ns application since it relies on secure memory addresses.

    - I was able to patch the issue in ns builds by porting the anomaly 30 workaround to TFM. More specifically, I added a secure callable function that performs the clock calibration, and I call that every 60 seconds from the application. The solution is pretty ugly, but here it is for reference:

    In the nonsecure (application) image add:

    static void calibration_work_handler(struct k_work *work)
    {
        int rc;
        printk("starting HF clock\n");
        struct onoff_client clk_cli;
        sys_notify_init_spinwait(&clk_cli.notify);
    
        rc = onoff_request(z_nrf_clock_control_get_onoff(CLOCK_CONTROL_NRF_SUBSYS_HF), &clk_cli);
        if (rc < 0) {
            printk("Could not start HF clock, request failed: %d\n", rc);
            return;
        }
    
        do {
            int res;
            rc = sys_notify_fetch_result(&clk_cli.notify, &res);
            if (rc == 0 && res) {
                break;
            }
        } while (rc);
    
        printk("starting secure call\n");
        psa_status_t status = psa_call(TFM_CLOCK_CALIB_HANDLE, PSA_IPC_CALL, NULL, 0, NULL, 0);
    
        printk("secure call returned: %d\n", status);
    
        rc = onoff_release(z_nrf_clock_control_get_onoff(CLOCK_CONTROL_NRF_SUBSYS_HF));
        if (rc < 0) {
            printk("Could not stop HF clock, release failed: %d\n", rc);
            return;
        }
    }
    
    static K_WORK_DEFINE(calibration_work, calibration_work_handler);
    
    static void calibration_timer_handler(struct k_timer *timer)
    {
        k_work_submit(&calibration_work);
    }
    
    static K_TIMER_DEFINE(calibration_timer, calibration_timer_handler, NULL);
    
    static int calibration_init(void)
    {
        k_timer_start(&calibration_timer, K_MSEC(1000), K_MSEC(60000));
        return 0;
    }
    SYS_INIT(calibration_init, APPLICATION, 0);


    And add this non-secure callable function to a TFM partition

    #include <stdint.h>
    #include "psa/service.h"
    #include "psa_manifest/tfm_secure_patch_partition.h"
    
    #define BIT(n) (1uL << (n))
    
    psa_status_t tfm_clock_calib_sfn(const psa_msg_t *msg)
    {
        // NOTE: the HFXO must be started prior to this patch, or it won't work
        const uint32_t higher_bits = *((volatile uint32_t *)0x50120820UL) & 0xFFFFFFC0;
        *((volatile uint32_t *)0x50120864UL) = 1 | BIT(31);
        *((volatile uint32_t *)0x50120848UL) = 1;
        uint32_t off_abs = 24;
    
        while (off_abs >= 24) {
            *((volatile uint32_t *)0x50120844UL) = 1;
            while (((*((volatile uint32_t *)0x50120840UL)) & (1 << 16)) != 0) {
            }
            const uint32_t current_cal = *((volatile uint32_t *)0x50120820UL) & 0x3F;
            const uint32_t cal_result = *((volatile uint32_t *)0x50120840UL) & 0x7FF;
            int32_t off = 1024 - cal_result;
    
            off_abs = (off < 0) ? -off : off;
    
            if (off >= 24 && current_cal < 0x3F) {
                *((volatile uint32_t *)0x50120820UL) = higher_bits | (current_cal + 1);
            } else if (off <= -24 && current_cal > 0) {
                *((volatile uint32_t *)0x50120820UL) = higher_bits | (current_cal - 1);
            }
        }
    
        *((volatile uint32_t *)0x50120848UL) = 0;
        *((volatile uint32_t *)0x50120864UL) = 0;
    
        return PSA_SUCCESS; // TODO: return err if clock not active
    }



    I think there should really be a workaround implemented for ns builds because currently users may be fooled into thinking the NS version of MPSL implements this workaround when in fact they are vulnerable.

  • Thanks Duncan for your findings.
    This does suggest that the Errata 30 HFINT/GRTC workaround is not being applied correctly in the TF-M/non-secure build path. The fact that secure cpuapp works, while cpuapp/ns fails, and that moving the calibration into TF-M fixes it, points to a gap in workaround coverage for NS builds.
    I will alert this internally with the teams.

    -Priyanka

  • Hi Priyanka,

    I think it could also be worth updating the description for errata anomaly 30. Which reads:


    Symptoms
    GRTC drifts in frequency at low temperature

    Consequences
    Applications with real-time requirements can behave unexpectedly.


    I had read this previously when investigating the problem, but the description seems to suggest this could affect only the accuracy of timing, not the ability of the chip to wake up from sleep. This issue could cause devices to permanently fail if the workaround isn't implemented and they don't have a WDT enabled in sleep mode. 

  • Sure. Will take these into consideration as well.

    -Priyanka

Reply Children
No Data
Related