Need to solve excessive thread-unblock latency on nrf9160 + zephyr SDK

I am using NRF9160 with the Nordic SDK 2.3.0, on the NRF9160DK.  

I have a 10 ms HW timer interrupt, and it is meant to wake up a thread each 10 ms.  The issue is that the latency between the ISR and the thread waking-up has excessive variance: randomly between 0.3 and 1.5 ms most of the time, and up to 4 ms.

I am setting up the timer via NRFX, using IRQ_DIRECT_CONNECT.  I have validated that it occurs each 10ms with negligible variance.

In order to make an isolated test condition, I have removed all the threads and known interrupts sources apart from the hw-timer and the kernel, and I am not initializing the modem.  Here is a logic analyzer plot of the thread unblocking vs. time.  The pulses should be all very close to 10ms, but there is remarkable variance.

The latency is so bad that I think there must be something wrong with the board, but we have tried multiple boards.  I don't know if the problem is Zephyr itself, or if there is some unexpected IRQ that is blocking the kernel (I am using a configuration taken from the default configuration of the nrf9160), or if it's something else.  Please let me know if you have any clues or ideas, thank you.

Parents
  • It seems like you need a tracing solution like Segger Systemview to find out what other contexts are causing that variance in triggering your timer handler.

    Since all contexts are organized into time axis, you will see if this variance is caused by other high priority context masking your timer handler.

  • It's not variance in the timer handler.  It is variance in the task that the timer handler unblocks.  We use the segger tools and couldn't find any thread causing problems, but I think there are two possibilities.  Maybe you can advise.

    1. we think we are disabling power-management features in Zephyr, but Zephyr is not confidence-inspiring.  It could be the case that it is going into low power modes despite our best efforts.

    2. There is an oscillating GPIO ISR.  Again, we don't think we are enabling any of these, but Zephyr is so opaque.

    Can System view give information about either of these?

    thanks

Reply
  • It's not variance in the timer handler.  It is variance in the task that the timer handler unblocks.  We use the segger tools and couldn't find any thread causing problems, but I think there are two possibilities.  Maybe you can advise.

    1. we think we are disabling power-management features in Zephyr, but Zephyr is not confidence-inspiring.  It could be the case that it is going into low power modes despite our best efforts.

    2. There is an oscillating GPIO ISR.  Again, we don't think we are enabling any of these, but Zephyr is so opaque.

    Can System view give information about either of these?

    thanks

Children
    1. How are you disabling the power management? The default behavior is tickless idle enabled in Zephyr which disables RTC tick in sleep triggered in idle_task. If you can see how long your idle task is running, then probably that will give you a hint of how long your system is sleeping. If you have disabled power management then check in the systemview that the idle task is running as less as possible.
    2. Which GPIO number is this? you can set a breakpoint in the GPIOTE interrupt handler and see the registers of this peripheral to find out which GPIO pin config is causing this. There are suggestions in the documentatation on how to avoid spurious interrupts from GPIO or port events.
Related