This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Hardware Timer interrupt delayed significantly by SoftDevice (Or even missed)

Hello,

I am having an issue where it appears that my hardware timer interrupts can occasionally be delayed by upwards of 20 ms or even be missed entirely. My code is based on Martin BL's example as seen below.

Original thread: https://devzone.nordicsemi.com/f/nordic-q-a/25074/why-is-not-my-timer-triggered-when-configured-through-ppi-and-spi

Direct link to MartinBL's code in this thread: https://devzone.nordicsemi.com/cfs-file/__key/communityserver-discussions-components-files/4/5037.main.c

Hardware: nRF52840 SoC

SDK: 15.0.0

Connected to PC with nRF52840 USB Dongle and nRF Connect

In short, I have two hardware timers as observed in the example source above (Timer CC values are different however):

TIMER_TIMER (Timer mode): Fires every 32 usec that triggers the SPI_START_TASK using PPI. This iterates through an ArrayList. This timer's interrupt callback id disabled.

TIMER_COUNTER (Counter mode): Counts the SPI_END_EVENT occurrences. This timer interrupt is enabled.

I have a SPI TX buffer of size 625 bytes. So I expect the TIMER_COUNTER interrupt to fire once for every 625 bytes sent.

It is imperative that my SPI writes occur every 32 usec since I am writing to a DAC. So to achieve this, I have a total buffer of length 1250 bytes that makes up 40 msec worth of DAC values where the last 20 msec worth of buffer space is deemed an overflow region. So the first 625 bytes are the intended transmit buffer region, and the remaining 625 bytes are the overflow region. This is done since I expect the SoftDevice to delay my TIMER_COUNTER interrupt callback to some degree. When the TIMER_COUNTER interrupt fires, I read the spi3.p_reg->TXD.PTR to see how far into the overflow region of the buffer it has gone, and then move the pointer back to the start of the buffer (plus an offset of however many bytes the TX.PTR went over 625) and update the TIMER_COUNTER CC value accordingly. This way I am not changing the interval of my 32 usec SPI writes.

Here is an example log snippet of everything working properly:

(10:58:57.251) bufferOver 1
(10:58:57.251) TXD.PTR = 0x20017828 and COUNTER CC0 = 624
(10:59:00.431) bufferOver 3
(10:59:00.431) TXD.PTR = 0x2001782c and COUNTER CC0 = 622
(10:59:06.450) bufferOver 3
(10:59:06.450) TXD.PTR = 0x2001782c and COUNTER CC0 = 622
(10:59:12.469) bufferOver 4
(10:59:12.469) TXD.PTR = 0x2001782c and COUNTER CC0 = 621
(10:59:17.931) bufferOver 3
(10:59:17.931) TXD.PTR = 0x2001782c and COUNTER CC0 = 622

The log statements above are printed from the TIMER_COUNTER interrupt handler every ~6 msec as to not flood the UART. The code will run fine continuously for 10 min to an hour (varies). The failure occurs when the following happens:

(11:54:45.606) TXD.PTR = 0x20017828 and COUNTER CC0 = 624
(11:54:51.588) bufferOver 1
(11:54:51.588) TXD.PTR = 0x20017828 and COUNTER CC0 = 624
(11:54:51.656) bufferOver 625
(11:54:51.656) TXD.PTR = 0x20017d10 and COUNTER CC0 = 0

In the log above, it appears hat for some reason the TIMER_COUNTER interrupt never fired when it counted to 624. It instead waited an additional 625 counts + 1 for interrupt latency, hence the bufferOver value of 625. This is causing the CC register to be set to a value of 0 effectively disabling the TIMER_COUNTER.

For context: 

bufferOver = (spi3.p_reg->TXD.PTR) - (uint32_t) p_waveformData_End;    //p_waveformData_End points to the halfway point of the TX buffer (625 bytes). It never changes

timer_counter.p_reg->CC[0] = number_of_transfers - bufferOver;    // number_of_transfers == 625 (Always)

So essentially, TXD.PTR made it all the way to the end of the entire buffer before the interrupt fired, indicating that either the SoftDevice blocked for ~20 msec (625 bytes of SPI data) or that the timer interrupt never fired once it counted to the CC value of 624.

So, is it possible for the SoftDevice (or some other driver) to disable interrupts at any point that could cause a missed timer interrupt? The only operations that are running during this time are the two hardware timers mentioned above, the two associated PPI channels, BLE with no traffic (0 PDU size), and an app_timer() that is blinking a LED.

Thanks for any help or insight! I have been trying to resolve this issue for days.

Parents
  • Hi,

    Higher priority interrupts will block lower priority interrupts and if you have an interrupt that runs for 20 ms or longer you will miss a timer interrupt (if it has lower priority). However, 20 ms is a relatively long time, so unless you are doing some serious processing or printing a lot of data on a slow UART line inside a high priority interrupt context I find it difficult to believe that this is what happens. 

    The Softdevice alone should never block your application for 20 ms straight. You can see what latency to expect from the Softdevice here

    But I don't understand why you need this whole mechanism? With your 32 us timer and PPI you can transfer your buffer to the DAC autonomously. The Softdevice will never block your timer from starting the SPI. 

    Have you looked through the Errata list?

    Best regards,
    Martin

  • Martin,

    I did a bit more digging over the weekend and found something interesting. In my TIMER_COUNTER callback I added the following code:

    // Reset the transmit buffer pointer offset to the start plus the number of bytes we went over
    txPointerUpdated = ((uint32_t) buffer) + bufferOver;
    spi3.p_reg->TXD.PTR = txPointerUpdated;
    
    // Make sure the TX pointer was actually updated
    if(spi3.p_reg->TXD.PTR != txPointerUpdated)
    {
        cprintf("TX.PTR should have been updated to 0x%x", txPointerUpdated);
        cprintf("TX.PTR is actually 0x%x", spi3.p_reg->TXD.PTR);
    }

    Sure enough, it appears that the TX.PTR is sometimes not getting updated as seen in the log below.

    (17:02:19.094) bufferOver 6
    (17:02:19.094) TXD.PTR = 0x2001782c and COUNTER CC0 = 1021
    (17:02:27.612) bufferOver 2
    (17:02:27.612) TXD.PTR = 0x20017826 and COUNTER CC0 = 1023
    (17:02:33.511) bufferOver 2
    (17:02:33.511) TXD.PTR = 0x20017826 and COUNTER CC0 = 1023
    (17:02:36.492) TX.PTR should have been updated to 0x20017822
    (17:02:36.492) TX.PTR is actually 0x2001803a
    (17:02:36.525) bufferOver 2046
    (17:02:36.525) TXD.PTR = 0x20018028 and COUNTER CC0 = 1

    Note that I increased the size of my buffer and overflow region to see if it made any difference before adding the TX pointer update check, which it didn't of course. Hence why the numbers are larger than in my original post.

    A couple things:

    1. I have read on the forums that the TXD.PTR pointer is double buffered. Is there any rule as to when this should/can be updated? Is it possible that writing to the TXD.PTR while the DMA controller is reading it will cause it to not be updated since the DMA controller is also incrementing it?

    2. Is there a way to check if the SPIM peripheral is currently busy?

    3. Is there a way to automatically reset the TX.PTR to the beginning of the ArrayList without CPU intervention? I ask this again because you were wandering why I implemented this mechanism.

    I also took a look at the latest errata (http://infocenter.nordicsemi.com/pdf/nRF52840_Rev_1_Errata_v1.1.pdf) and the only one I see of interest is "3.30 [198] nRF52840: SPIM3 transmit data might be corrupted". It talks about not accessing the data in the TX data buffer where TXD.PTR is pointing at the same time the DMA controller is fetching data. This is specific to the data itself, not the pointer to the data but I imagine the same restrictions apply.

    Any ideas, suggestions, or known workarounds for this?

    Thanks again for any help!

    Derek

  • Hi,

    1. I have read on the forums that the TXD.PTR pointer is double buffered. Is there any rule as to when this should/can be updated?
      Please refer to this documentation: "The .PTR and .MAXCNT registers are double-buffered. They can be updated and prepared for the next transmission immediately after having received the STARTED event."
    2. Is there a way to check if the SPIM peripheral is currently busy?
      Unfortunately the only way is to control it in software
    3. Is there a way to automatically reset the TX.PTR to the beginning of the ArrayList without CPU intervention? I ask this again because you were wandering why I implemented this mechanism.
      No. I didn't quite understand your reasoning at first, but I think I do now, and I think your strategy is good.
    4. Errata 198 shouldn't be relevant unless you are using SPIM instance #3. Are you? If so, can you try with instance 0,1, or 2?

    How often does it happen, and can you say something about the frequency of which it occurs?

    Is this easy to reproduce?

    Would I be able to run your code and reproduce it on my desk?

    Can you upload your code? We can make the case private if you prefer confidentiality. 

    -Martin

Reply
  • Hi,

    1. I have read on the forums that the TXD.PTR pointer is double buffered. Is there any rule as to when this should/can be updated?
      Please refer to this documentation: "The .PTR and .MAXCNT registers are double-buffered. They can be updated and prepared for the next transmission immediately after having received the STARTED event."
    2. Is there a way to check if the SPIM peripheral is currently busy?
      Unfortunately the only way is to control it in software
    3. Is there a way to automatically reset the TX.PTR to the beginning of the ArrayList without CPU intervention? I ask this again because you were wandering why I implemented this mechanism.
      No. I didn't quite understand your reasoning at first, but I think I do now, and I think your strategy is good.
    4. Errata 198 shouldn't be relevant unless you are using SPIM instance #3. Are you? If so, can you try with instance 0,1, or 2?

    How often does it happen, and can you say something about the frequency of which it occurs?

    Is this easy to reproduce?

    Would I be able to run your code and reproduce it on my desk?

    Can you upload your code? We can make the case private if you prefer confidentiality. 

    -Martin

Children
  • How often does it happen, and can you say something about the frequency of which it occurs?

    It appears to be random. It can happen within the first 10 minutes of it running or up to an hour later. It always occurs in less than 2 hours. I can perform the same operation when connected to my application via USB and I have absolutely no issue after 10 hours of continuous use, so it definitely seems related to the SoftDevice.

    Is this easy to reproduce?

    Yes, just allow the application to run and wait an hour. It usually occurs within an hour.

    Can you upload your code? We can make the case private if you prefer confidentiality. 

    I have actually created a private ticket per recommendation from my direct FAE contact that another engineer was assigned to. I will PM you the details.

    The code must remain confidential. I will continue my discussion in that thread in regards to how to upload my project.

    Thanks!

  • Hi,

    Thanks.

    I saw your other case. I think it is best that we proceed over there. 

    -Martin

  • Hello, I am running into the exact same bug with my NRF52840. I am using code based on this thread to read and write from SPI NAND via DMA using timer counter (timer2 in my case). I have tried it with SPIM 0-3 and all of them run into this issue. if BLE is off I can read/write 10s of thousands of files without an issue. When the BLE goes on, the device will miss an interrupt within the first 100 files, somewhat randomly. Was this issue ever resolved with ? I am happy to create a private case for this but figured others might experience this as well, have tired all sorts of ways to fix this to no avail. 

  • Hey,

    Take a look at my response to vittopascu in this ticket for how this was (mostly) resolved: https://devzone.nordicsemi.com/f/nordic-q-a/37731/nrf52840-spi-easydma-double-buffering

    It ended up not being a timer issue, but rather an issue with reading and updating TX.PTR. In short, you cannot update the TX.PTR while a SPI transaction is in progress despite what the datasheet says.

Related