This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Hardware Timer interrupt delayed significantly by SoftDevice (Or even missed)

Hello,

I am having an issue where it appears that my hardware timer interrupts can occasionally be delayed by upwards of 20 ms or even be missed entirely. My code is based on Martin BL's example as seen below.

Original thread: https://devzone.nordicsemi.com/f/nordic-q-a/25074/why-is-not-my-timer-triggered-when-configured-through-ppi-and-spi

Direct link to MartinBL's code in this thread: https://devzone.nordicsemi.com/cfs-file/__key/communityserver-discussions-components-files/4/5037.main.c

Hardware: nRF52840 SoC

SDK: 15.0.0

Connected to PC with nRF52840 USB Dongle and nRF Connect

In short, I have two hardware timers as observed in the example source above (Timer CC values are different however):

TIMER_TIMER (Timer mode): Fires every 32 usec that triggers the SPI_START_TASK using PPI. This iterates through an ArrayList. This timer's interrupt callback id disabled.

TIMER_COUNTER (Counter mode): Counts the SPI_END_EVENT occurrences. This timer interrupt is enabled.

I have a SPI TX buffer of size 625 bytes. So I expect the TIMER_COUNTER interrupt to fire once for every 625 bytes sent.

It is imperative that my SPI writes occur every 32 usec since I am writing to a DAC. So to achieve this, I have a total buffer of length 1250 bytes that makes up 40 msec worth of DAC values where the last 20 msec worth of buffer space is deemed an overflow region. So the first 625 bytes are the intended transmit buffer region, and the remaining 625 bytes are the overflow region. This is done since I expect the SoftDevice to delay my TIMER_COUNTER interrupt callback to some degree. When the TIMER_COUNTER interrupt fires, I read the spi3.p_reg->TXD.PTR to see how far into the overflow region of the buffer it has gone, and then move the pointer back to the start of the buffer (plus an offset of however many bytes the TX.PTR went over 625) and update the TIMER_COUNTER CC value accordingly. This way I am not changing the interval of my 32 usec SPI writes.

Here is an example log snippet of everything working properly:

(10:58:57.251) bufferOver 1
(10:58:57.251) TXD.PTR = 0x20017828 and COUNTER CC0 = 624
(10:59:00.431) bufferOver 3
(10:59:00.431) TXD.PTR = 0x2001782c and COUNTER CC0 = 622
(10:59:06.450) bufferOver 3
(10:59:06.450) TXD.PTR = 0x2001782c and COUNTER CC0 = 622
(10:59:12.469) bufferOver 4
(10:59:12.469) TXD.PTR = 0x2001782c and COUNTER CC0 = 621
(10:59:17.931) bufferOver 3
(10:59:17.931) TXD.PTR = 0x2001782c and COUNTER CC0 = 622

The log statements above are printed from the TIMER_COUNTER interrupt handler every ~6 msec as to not flood the UART. The code will run fine continuously for 10 min to an hour (varies). The failure occurs when the following happens:

(11:54:45.606) TXD.PTR = 0x20017828 and COUNTER CC0 = 624
(11:54:51.588) bufferOver 1
(11:54:51.588) TXD.PTR = 0x20017828 and COUNTER CC0 = 624
(11:54:51.656) bufferOver 625
(11:54:51.656) TXD.PTR = 0x20017d10 and COUNTER CC0 = 0

In the log above, it appears hat for some reason the TIMER_COUNTER interrupt never fired when it counted to 624. It instead waited an additional 625 counts + 1 for interrupt latency, hence the bufferOver value of 625. This is causing the CC register to be set to a value of 0 effectively disabling the TIMER_COUNTER.

For context:

bufferOver = (spi3.p_reg->TXD.PTR) - (uint32_t) p_waveformData_End; //p_waveformData_End points to the halfway point of the TX buffer (625 bytes). It never changes

timer_counter.p_reg->CC[0] = number_of_transfers - bufferOver; // number_of_transfers == 625 (Always)

So essentially, TXD.PTR made it all the way to the end of the entire buffer before the interrupt fired, indicating that either the SoftDevice blocked for ~20 msec (625 bytes of SPI data) or that the timer interrupt never fired once it counted to the CC value of 624.

So, is it possible for the SoftDevice (or some other driver) to disable interrupts at any point that could cause a missed timer interrupt? The only operations that are running during this time are the two hardware timers mentioned above, the two associated PPI channels, BLE with no traffic (0 PDU size), and an app_timer() that is blinking a LED.

Thanks for any help or insight! I have been trying to resolve this issue for days.

Top Replies

MartinBL over 6 years ago in reply to droberson +1

Hi, Thanks. I saw your other case. I think it is best that we proceed over there. -Martin

Parents

0 MartinBL over 6 years ago

Hi,

Higher priority interrupts will block lower priority interrupts and if you have an interrupt that runs for 20 ms or longer you will miss a timer interrupt (if it has lower priority). However, 20 ms is a relatively long time, so unless you are doing some serious processing or printing a lot of data on a slow UART line inside a high priority interrupt context I find it difficult to believe that this is what happens.

The Softdevice alone should never block your application for 20 ms straight. You can see what latency to expect from the Softdevice here.

But I don't understand why you need this whole mechanism? With your 32 us timer and PPI you can transfer your buffer to the DAC autonomously. The Softdevice will never block your timer from starting the SPI.

Have you looked through the Errata list?

Best regards,
Martin
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 MartinBL over 6 years ago

Hi,

Higher priority interrupts will block lower priority interrupts and if you have an interrupt that runs for 20 ms or longer you will miss a timer interrupt (if it has lower priority). However, 20 ms is a relatively long time, so unless you are doing some serious processing or printing a lot of data on a slow UART line inside a high priority interrupt context I find it difficult to believe that this is what happens.

The Softdevice alone should never block your application for 20 ms straight. You can see what latency to expect from the Softdevice here.

But I don't understand why you need this whole mechanism? With your 32 us timer and PPI you can transfer your buffer to the DAC autonomously. The Softdevice will never block your timer from starting the SPI.

Have you looked through the Errata list?

Best regards,
Martin
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 droberson over 6 years ago in reply to MartinBL

Hey Martin!

My timer interrupts are set to priority 2 (Highest possible) for this reason. Normally, I am not printing any information to a debug UART during this time, but have added a few debug statements as highlighted in my original post in an attempt to debug why this is happening. Ie, it was happening prior to adding any debug statements.

Based on the BLE peripheral timing diagram, I agree in that the SoftDevice shouldn't block for more than a few hundred usec at a time, hence why I am stumped as to why this is happening.

In regards to the buffer "mechanism" that I am using, the purposes of this is because the DAC writes need to occur indefinitely until the application tells it to stop. Given my understanding of ArrayLists, PPI, timers, and the SPI driver, the TXD.PTR must be reset by the CPU once all data in the ArrayList has been transmitted. In other words, the SPI driver cannot loop through the ArrayList without CPU intervention. This was confirmed by another FAE in my post here: https://devzone.nordicsemi.com/f/nordic-q-a/37731/nrf52840-spi-easydma-double-buffering (Read the last 3 replies). If there is a way to indefinitely loop through an ArrayList without CPU intervention, that would be great! (As this is what I had hoped could be done originally). Or perhaps there is an easier way to achieve my desired behavior?

I will look into the Errata list. I am using a Rigado branded nRF52840, so I need to figure out which revision Nordic chipset it is.

Thanks!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 droberson over 6 years ago in reply to MartinBL
Martin,

I did a bit more digging over the weekend and found something interesting. In my TIMER_COUNTER callback I added the following code:

// Reset the transmit buffer pointer offset to the start plus the number of bytes we went over txPointerUpdated = ((uint32_t) buffer) + bufferOver; spi3.p_reg->TXD.PTR = txPointerUpdated; // Make sure the TX pointer was actually updated if(spi3.p_reg->TXD.PTR != txPointerUpdated) { cprintf("TX.PTR should have been updated to 0x%x", txPointerUpdated); cprintf("TX.PTR is actually 0x%x", spi3.p_reg->TXD.PTR); }

Sure enough, it appears that the TX.PTR is sometimes not getting updated as seen in the log below.

(17:02:19.094) bufferOver 6
(17:02:19.094) TXD.PTR = 0x2001782c and COUNTER CC0 = 1021
(17:02:27.612) bufferOver 2
(17:02:27.612) TXD.PTR = 0x20017826 and COUNTER CC0 = 1023
(17:02:33.511) bufferOver 2
(17:02:33.511) TXD.PTR = 0x20017826 and COUNTER CC0 = 1023
(17:02:36.492) TX.PTR should have been updated to 0x20017822
(17:02:36.492) TX.PTR is actually 0x2001803a
(17:02:36.525) bufferOver 2046
(17:02:36.525) TXD.PTR = 0x20018028 and COUNTER CC0 = 1

Note that I increased the size of my buffer and overflow region to see if it made any difference before adding the TX pointer update check, which it didn't of course. Hence why the numbers are larger than in my original post.

A couple things:

1. I have read on the forums that the TXD.PTR pointer is double buffered. Is there any rule as to when this should/can be updated? Is it possible that writing to the TXD.PTR while the DMA controller is reading it will cause it to not be updated since the DMA controller is also incrementing it?

2. Is there a way to check if the SPIM peripheral is currently busy?

3. Is there a way to automatically reset the TX.PTR to the beginning of the ArrayList without CPU intervention? I ask this again because you were wandering why I implemented this mechanism.

I also took a look at the latest errata (http://infocenter.nordicsemi.com/pdf/nRF52840_Rev_1_Errata_v1.1.pdf) and the only one I see of interest is "3.30 [198] nRF52840: SPIM3 transmit data might be corrupted". It talks about not accessing the data in the TX data buffer where TXD.PTR is pointing at the same time the DMA controller is fetching data. This is specific to the data itself, not the pointer to the data but I imagine the same restrictions apply.

Any ideas, suggestions, or known workarounds for this?

Thanks again for any help!

Derek
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 MartinBL over 6 years ago in reply to droberson
Hi,

I have read on the forums that the TXD.PTR pointer is double buffered. Is there any rule as to when this should/can be updated?
Please refer to this documentation: "The .PTR and .MAXCNT registers are double-buffered. They can be updated and prepared for the next transmission immediately after having received the STARTED event."

Is there a way to check if the SPIM peripheral is currently busy?
Unfortunately the only way is to control it in software

Is there a way to automatically reset the TX.PTR to the beginning of the ArrayList without CPU intervention? I ask this again because you were wandering why I implemented this mechanism.
No. I didn't quite understand your reasoning at first, but I think I do now, and I think your strategy is good.

Errata 198 shouldn't be relevant unless you are using SPIM instance #3. Are you? If so, can you try with instance 0,1, or 2?

How often does it happen, and can you say something about the frequency of which it occurs?

Is this easy to reproduce?

Would I be able to run your code and reproduce it on my desk?

Can you upload your code? We can make the case private if you prefer confidentiality.

-Martin
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 droberson over 6 years ago in reply to MartinBL

How often does it happen, and can you say something about the frequency of which it occurs?

It appears to be random. It can happen within the first 10 minutes of it running or up to an hour later. It always occurs in less than 2 hours. I can perform the same operation when connected to my application via USB and I have absolutely no issue after 10 hours of continuous use, so it definitely seems related to the SoftDevice.

Is this easy to reproduce?

Yes, just allow the application to run and wait an hour. It usually occurs within an hour.

Can you upload your code? We can make the case private if you prefer confidentiality.

I have actually created a private ticket per recommendation from my direct FAE contact that another engineer was assigned to. I will PM you the details.

The code must remain confidential. I will continue my discussion in that thread in regards to how to upload my project.

Thanks!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
+1 MartinBL over 6 years ago in reply to droberson

Hi,

Thanks.

I saw your other case. I think it is best that we proceed over there.

-Martin
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel