NRF9160 excessive CPU wakeup latency from System ON IDLE

My application requires one relatively low latency (<10 us) GPIO interrupt. I am using NCS 2.4.2 and running the code out of flash. The GPIOTE peripheral's interrupt has a bunch of scaffolding to deal with the multiplexed PORT interrupts, Zephyr GPIO driver abstraction, etc. To avoid that I've configured a GPIOTE channel to generate a PPI event which is used to trigger a dedicated interrupt on an EGU. This EGU's ISR is configured using IRQ_DIRECT_CONNECT and IRQ_ZERO_LATENCY and has a minimal amount of software overhead. Functionally this works as expected, however the typical latency from GPIO edge to ISR entry is still too high. To Illustrate this I've captured Salae traces where Channel 0 corresponds to the input signal to the GPIO and Channel 1 is a different GPIO which is toggled from the ISR (and is the only thing the ISR does).

It can be seen that the first latency within a burst is 15.5 us and subsequent ones are 15.0 us. The final one is 3.3 us, due to the fact that the CPU has not managed to go back to IDLE by the time the transition occurred.

That seems reasonable given that the SOC is in low power mode by default, but triggering the CONSTLAT task makes surprisingly little difference.

As can be seen above, this only shaved off about 0.5 us from the IDLE latencies. This is a somewhat dubious savings given that the typical HFINT startup time is supposed to be 3.2 us.

To ensure that a HFCLK is indeed running while the CPU is in WFI I enabled a periodic TIMER to run in the background as well.

This did not affect the initial 15.0 us latency but did reduce the subsequent IDLE latencies by another 0.4 us.

Next I tried enabling the HFXO.

This decreased the initial latency but increased the subsequent ones, suggesting that the difference is primarily due to the state of the HFINT. Though I'm not too worried about this difference, and really just need to get the typical latency much more similar to the final one.

To further prove that the high latencies are associated with a wake from IDLE I captured a trace with the same firmware running but with the debugger connected.

 As you can see, all latencies are now between 4.0-4.3 us. Since the final latency is now equivalent to the others this corroborates the original theory that the CPU had simply not made it back to WFI by the time that interrupt occurred.

If I disconnect the debugger but simply comment out the WFI instruction from within arch_cpu_idle I get similar results. If I also re-disable the HFXO the latencies further decrease by about 0.5 us.

This suggests that waking from system ON IDLE incurs over 10 us of unavoidable latency, regardless of running in constant latency mode or the HFINT status. If I use the PPI to trigger a GPIOTE task to toggle the output GPIO I see a latency of only ~400 ns. This confirms that the latency relates to the CPU and not the peripherals. To me this seems unexpectedly high, though I don't any documentation in the product spec which specifies the expected values. I also don't see any other relevant cases in DevZone for NRF9x, but NRF5x seem to be capable of lower latencies.

Is this expected or a known issue? Is there anything that can be done to avoid it (other than completely avoiding WFI)?

Parents
  • Hi,

     

    I'll start with the summary:

    The wake-up from sleep (0.5 mA-ish) is >10 us, as you have already found out.

    For reference, this is the firmware that I have been using for evaluation (currently configured with irq handler in RAM, which did not help much, unfortunately):

    323702.zip

     

    With the system in "lowpwr" mode, meaning NRF_POWER->TASKS_LOWPWR=1, the wake up period is as shown here in the PS, with parameter "tWFE2CPU":

    https://infocenter.nordicsemi.com/topic/ps_nrf9160/supply_monitoring.html?cp=2_1_0_4_2_3#topic

     

    There is no parameter for constant latency wakeup in the nRF9160 PS, ie NRF_POWER->TASKS_CONSTLAT=1, but the nRF5340 application core, which is also a Cortex M33, has this specified to 10 us:

    https://infocenter.nordicsemi.com/topic/ps_nrf5340/chapters/reset/doc/reset.html?cp=4_0_0_3_9_11_0#unique_1577696816

     

    I am measuring 11.7 us when using direct IRQs and zero latency interrupts, ie. a bare metal implementation of GPIOTE:

    Topmost is the gpio output (LED), bottom most is the gpiote in (button in my case).

     

    What I tried is to run everything from RAM, but this will only help for the subsequent interrupt if the cache hits successfully:

    If there's another thread that has been active in between, you will still see ~12 us delay from wakeup. The main contributor here is the internal regulator settling time from sleep -> active state.

    If another thread is currently active, you will then see much lower latency, as you also mention; as the system is currently not in a low power state.

     

    As you can see, all latencies are now between 4.0-4.3 us. Since the final latency is now equivalent to the others this corroborates the original theory that the CPU had simply not made it back to WFI by the time that interrupt occurred.

    Latency will be better with the debugger present as it will force the system into a high power mode (>1 mA), which is automatically entered by the hardware when the external debugger attaches via the CoreSight debug access port. This will prevent the system from entering a low power state.

     

    Kind regards,

    Håkon

  • Thank you for the prompt and detailed response! It seems like my observations are consistent with yours.

    If accessing flash is really what causes the additional 10 us delay after WFI then moving the arch_cpu_idle function to RAM would probably prevent the penalty associated with an occasional cache miss when coming out of WFI. Though having the ISR and all the functions it calls execute entirely from RAM is somewhat difficult in a real application.

    In our case we can live with disabling WFI during the period where we have the low-latency interrupt requirement (with CONFIG_ARM_ON_ENTER_CPU_IDLE_HOOK) and the associated 2 mA power penalty.

  • Hi,

     

    I'm always happy to help out. 

    Nick Ewalt said:
    If accessing flash is really what causes the additional 10 us delay after WFI then moving the arch_cpu_idle function to RAM would probably prevent the penalty associated with an occasional cache miss when coming out of WFI. Though having the ISR and all the functions it calls execute entirely from RAM is somewhat difficult in a real application.

    Flash is one contributor, but the regulators from low power state -> active state is also a contributor.

    I agree that moving the vectors to RAM isn't the best option for production, I just wanted to explore all options here to understand the limitations.

    Nick Ewalt said:
    In our case we can live with disabling WFI during the period where we have the low-latency interrupt requirement (with CONFIG_ARM_ON_ENTER_CPU_IDLE_HOOK) and the associated 2 mA power penalty.

    I'm glad to hear that you are able to work around this behavior with the hooks for sleep enter/exit and the current consumption penalty that comes with having the system running during this low latency detection period. 

     

    Kind regards,

    Håkon

Reply
  • Hi,

     

    I'm always happy to help out. 

    Nick Ewalt said:
    If accessing flash is really what causes the additional 10 us delay after WFI then moving the arch_cpu_idle function to RAM would probably prevent the penalty associated with an occasional cache miss when coming out of WFI. Though having the ISR and all the functions it calls execute entirely from RAM is somewhat difficult in a real application.

    Flash is one contributor, but the regulators from low power state -> active state is also a contributor.

    I agree that moving the vectors to RAM isn't the best option for production, I just wanted to explore all options here to understand the limitations.

    Nick Ewalt said:
    In our case we can live with disabling WFI during the period where we have the low-latency interrupt requirement (with CONFIG_ARM_ON_ENTER_CPU_IDLE_HOOK) and the associated 2 mA power penalty.

    I'm glad to hear that you are able to work around this behavior with the hooks for sleep enter/exit and the current consumption penalty that comes with having the system running during this low latency detection period. 

     

    Kind regards,

    Håkon

Children
No Data
Related