This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nRF52840 SPI EasyDMA Double Buffering

Hello,

I am trying to determine whether or not the SPI EasyDMA functionality of the nrf52840 supports double buffering in the sense that the DMA controller will switch buffers without intervention from the processor. In other words, can you start one DMA transfer and then immediately queue the next DMA transfer so that the following scenario happens:

Example:

255 Byte DMA buffer x 2

1. Start DMA transfer from buffer1 and immediately queue DMA transfer from buffer2.

2. Receive DMA transfer complete interrupt from buffer1. The DMA controller has already automatically switched to buffer2. Load buffer1 with new data and queue this transfer.

3. Receive DMA transfer complete interrupt from buffer2. The DMA controller has already automatically switched to buffer1. Load buffer2 with new data and queue this transfer.

4. etc...

I ask because I am using the SoftDevice while receiving ADC data over SPI sampled at 250 Hz and I want to make sure that if the DMA interrupt is delayed by any BLE activity that I do not miss a sample from my ADC's. If my understanding of this is correct, is there an example I can reference?

SoftDevice: S140

SDK: 15.0.0

Thanks,

Derek

Parents
  • I believe you've understood it correctly. 

    In essence it's just the RXD and TXD registers that are double buffered, which means that they can be updated immediately after an EVENTS_STARTED has fired. You can use as many buffers as you like. 

    You should check out the EasyDMA array list feature, as it allows you to queue multiple buffers at once without having to manually set the pointer registers each time after an EVENTS_STARTED. 

  • Thanks for the response!

    Let me preface this with what I am trying to now that I have gotten my hands on my DAC hardware. The DAC requires 2 bytes to be written while toggling chip select in-between transfers. I am trying to store an entire waveform in RAM and kick off a DMA SPI transfer to loop over this data indefinitely or until stopped by the application without intervention from the processor because the DAC writes cannot be interrupted. At the same time, the SoftDevice will be "listening" for a stop command to halt the DAC updates. My concern is that the SoftDevice interrupts will corrupt the output waveform.

    After looking at this thread: https://devzone.nordicsemi.com/f/nordic-q-a/18638/easydma-array-list, it looks like this can be done using Array Lists, hardware timers, and PPI. However, does this method also toggle the chip select line between transfers of each buffer in the Array List?

    My idea was to store each 2-byte value into a very large ArrayList as shown below and loop over this using SPI DMA, hardware timers, and PPI with no processor usage. Is this possible? If not, how can I achieve this? Are there any examples for SPI?

    Thanks!

    typedef struct ArrayList
    {
    uint8_t buffer[2]; // 2 DAC bytes
    } ArrayList_type;
    
    ArrayList_type MyArrayList[SIZEOF_WAVEFORM];

  • The SPIM0-2 peripherals does not have HW Chip Select, but the new SPIM3/QSPI peripheral does.

    If you need the QSPI peripheral for other devices I suggest you used SPIM0-2 and control the CS pin via PPI and GPIOTE. SPI slaves will usually have timing requirements for when the CS pin is pulled low and high, therefore I suggest that you control both the CS and the SPIM's TASKS_START with a TIMER and PPI. 

    You'll need to connect the TIMER's Compare0 event to the a GPIOTE TASKS_OUT, where the GPIOTE task is set up to pull the CS low. Then you'll need to connect TIMER's Compare1 event to the SPIM's TASKS_START, and the SPIM's EVENTS_END event to the GPIOTE TASKS_OUT[1], where the GPIOTE task is set up to pull the CS line high between each transfer. You will also need to fork the SPIM's EVENTS_END event to the TIMER's TASKS_CLEAR to restart the cycle. 

  • From what I have read on the Devzone and the datasheet, what you describe makes sense. Is there a working SPI driver example that implements this or similar functionality that I can reference? Specifically in regards to setting up the timers, tasks, forking, and PPI? I greatly appreciate your help.

    Thanks!

  • See Timer ExamplePPI ExampleGPIOTE Example, SPI Master ExampleGPIOTE Driver description, SPI master Driver description, GPIOTE Driver and HAL API, PPI Driver and HAL API, and TIMER Driver and HAL API.

    I'd start by playing with the TIMER and GPIOTE via HAL. Set up some GPIOTE tasks like pin toggles that are triggered by a TIMER's compare tasks. Use a digital analyzer to see how the GPIOs behave. This will teach you the basics of the PPI system (EVENT --> TASK) and how to set up the TIMER and GPIOTE. 

    The PPI system uses the register address of an EVENT and couples it to the register address of a TASK. All drivers or HALs should have a function for getting the address of an EVENT or TASK. Those addresses are also given in the Registers chapter of a peripheral's technical specification. 

    You can use the SPIM driver to initialize the SPIM peripheral and the HAL API to enable the linked list feature with a call tonrf_spim_tx_list_enable

  • I have spent a bit of time playing around with these examples and have gotten DMA transmit working with SPIM3 and HW chip select. Thanks for pointing that out, I didn't realize SPIM3 had this feature. However, I have run into a small problem and hopefully I am missing something here.

    My DAC SPI writes need to occur at specific intervals without CPU involvement indefinitely. So what I have done is set up a periodic timer to trigger each SPI transfer using PPI. I have set up a second timer to count the number of transfers using the SPI end event. This works great in that my SPI transmit ArrayList is iterated through as expected solely using PPI and timers. The problem arises when you want to reset the SPI transfer pointer back to the top of the ArrayList. How can this be done without CPU involvement?

    Unless I am missing something, this was my idea as a possible solution. Since the SPIM buffer pointers are double buffered, I was going to set up a third timer to count the number of SPI transfers -1 and interrupt the CPU. So while the last transfer is ongoing, the 3rd timer callback will set the SPI pointer back to the top, ie Channel.PTR = &MyArrayList;. Once the next SPI transfer begins, it will start back at the top. My only concern with this is that the system could stall due to other interrupts and not reset the pointer soon enough and overflow the buffer.

    Does this approach makes sense and seem feasible? Is there an easier way that I am missing to reset the pointer back to the top of the buffer? Just to reiterate, it is imperative that there is no noticeable delay in DAC writes when looping over this buffer.

    Edit: Not sure why I wasn't able to find this article before, but it appears that the ArrayList pointer cannot be reset without CPU involvement after-all: https://devzone.nordicsemi.com/f/nordic-q-a/23349/ringbuffer-spi-twi-tx-with-fixed-sample-rate . Hopefully my idea of using a third timer will work unless you have any other tricks or ideas?

    Thanks!

  • I think you have grasped the functionality of the SPIM peripherals and linked list EasyDMA, and I'm afraid that you cannot be 100% sure that you can update the buffer pointer in time since we need the CPU to write to the register and you only have the time it takes to send the last two bytes to do it. 

    There is one thing I am curious about and that is if the TXD.PTR itself is incremented by the EasyDMA or if it is copied at the start of the SPI transaction. If it's the latter then the TXD.PTR should contain the pointer to the beginning of you buffer and you can immidiately trigger another transfer to start the process again. To verify this I suggest you halt the CPU when you have finished transfering your buffer and read the content of SPIM3's TXD.PTR register. If it contains the address of your buffer, then you should have a 100% glitch free communication with your DAC. 

Reply
  • I think you have grasped the functionality of the SPIM peripherals and linked list EasyDMA, and I'm afraid that you cannot be 100% sure that you can update the buffer pointer in time since we need the CPU to write to the register and you only have the time it takes to send the last two bytes to do it. 

    There is one thing I am curious about and that is if the TXD.PTR itself is incremented by the EasyDMA or if it is copied at the start of the SPI transaction. If it's the latter then the TXD.PTR should contain the pointer to the beginning of you buffer and you can immidiately trigger another transfer to start the process again. To verify this I suggest you halt the CPU when you have finished transfering your buffer and read the content of SPIM3's TXD.PTR register. If it contains the address of your buffer, then you should have a 100% glitch free communication with your DAC. 

Children
  • Just wanted to follow up on how I solved this issue. TXD.PTR is indeed incremented by the DMA controller. 

    What I did to get around the fact that you can't reset the pointer without the CPU is added a 10 msec buffer "overhead" to my DAC ArrayList buffer. In other words, I duplicate my DAC waveform for an additional 10 msec beyond where I expect the transmit buffer to end. Once my interrupt fires to tell me that I have transmitted the entire "expected" buffer, I then check TXD.PTR to see how far beyond the expected buffer it went into the extra 10 msec buffer. I then move the pointer back to the start of the buffer plus the offset of (TXD.PTR - end of expected buffer). Worst case scenario I duplicate a single data-point of the waveform on the DAC. Testing with BLE enabled yields excellent results and I haven't had any issues yet.

    Thanks again for all of your help. Without it I wouldn't have resolved this issue.

  • Hello, I have a follow up question to this: How do you make sure that between the moment you read TXD.PTR and the moment you overwrite it the DMA did not update it?

    I was thinking that one could do active wait inside the interrupt for an EVENTS_STARTED and then change it right after, but I wonder if there is a way to make sure that such active poll is not interrupted by a higher-priority interrupt.

    Thank you

  • Hello,

    That's a good point that I actually ran into later and opened another ticket for that was private due to code sharing.

    Here is an excerpt from that ticket. Basically, you have to disconnect the PPI channel between the timer and SPI peripheral, wait for any pending SPI transaction to complete, read the TX pointer, update it, and reconnect the PPI channel. I did however see delays doing this causing a few missed samples, so this is not a complete solution but a step in the right direction. The delay issue was never resolved but our DAC waveforms were such a low frequency that it didn't matter for that project. Perhaps Nordic can chime in further here but my guess is they would want you to open another ticket. If you do get an answer on how to resolve the delay issue that I am sure you will observe during your tests, I would be grateful if you could let me know.

    Thanks!

    Update: Step 2 and 3 (below) requires that you use the SPIM3 peripheral on the nRF52840. It is the only peripheral with hardware chip select.

    Excerpt:

    ------------------------------------------------------

    From a high level, here is my implementation which is what I want to discuss:

    1. Disconnect the PPI 32 usec timer from the SPI Start Task

    2. Determine if there is a SPI transaction currently in progress by checking if the CS (Chip select) pin is low. This is a hardware driven chip select hence why this works.

    3. If the Chip select pin is low, I set EVENTS_ENDTX to 0

    4. Wait until the EVENTS_ENDTX register is set to 1 by the DMA hardware

    5. Update the TX.PTR

    6. Reconnect the 32 usec timer to the SPI Start Task using PPI

    Now my concern is with step 2. I wasn't sure how else to determine if there is a SPI transaction in progress before setting EVENTS_ENDTX to 0 and polling for it to return to 1. The purpose of verifying if there is a SPI transaction in progress is because simply setting  EVENTS_ENDTX to 0 and waiting for it to return to 1 will not work if there was not a SPI transaction already in progress. You will end up waiting forever for it to return to 1 and it never will. And since the DMA controller only writes "1"s to these registers, I wasn't sure how else to check this.

  • Hello I found my workaround for a true endless loop playback without skipped samples, it is somehow limited to my specific application but I figured it could maybe work for you too. Sorry for the long post.

    Let me describe the limitations of such approach:

    1) I will assume you must transmit 12 bit samples to a DAC in the form of 16bit words. We will call this word_length = 2. If you would transmit 24bit words we would have word_length = 3 and so on.

    2) Only works with sample rates that are SPI_FREQUENCY / (8 * (word_length + 1)) like for instance 83.333kHz with SPI_FREQUANCY = 2MHz and word_length = 2.

    3) Only works with DACs that can "ignore" bits that are transmitted after the nominal word length, before the chip select goes high again.

    4) requires 2 timer-counters

    5) requires (in my specific case) 4 PPI channels

    6) only works if the data you want to transfer is coming straight from another peripheral via DMA, like from an SPI flash.

    7) The maximum number of buffered samples (from now on buffer_length) is (255 - CMD) / (word_len + 1) where CMD is the number of bytes that must be transmitted to the external flash to start a transfer (usually 1 byte command and 3 byte address). For example, we could have max 83 sample buffer with word_length = 2 and CMD = 4.

    8) It only allows to cyclically play signals the length of which is an integer multiple of the chosen buffer size.

    Here is how it works (the big picture):

    1) we represent each signal sample with an extra dummy byte after the nominal word length (24 bits if you want to transmit 16bit words).

    2) we will assume that we have a word_length = 2, so we will imagine that we have already a full buffer of 83 samples on 16 + 8 bits in RAM. We will se later how we bring new data in the buffer without skipping samples

    3) we configure the output SPI0 to perform a DMA transfer of (buffer_length * (word_length + 1) - 1) bytes at a certain frequency (say 2MHz), WITHOUT  POST-INCREMENT and we hold the transfer. Notice that the size of the transfer is the buffer size in bytes minus one, more on this at point 8). When started SPI0-DMA will transfer the whole buffer byte by byte without distinction and, at the end, it will reset the start address of the DMA to the beginning of the buffer without CPU intervention.

    4) we set up TIMER1 @ 16MHz with cyclic COMPARE0_CLEAR mode with the same exact period as the SPI word-transfer, in our case period = (16MHz / SPI0_FREQUENCY) * 8 * (word_length + 1) = 192. When the TIMER1 is started, its cycle will be synchronous with the SPI transfers. The phase alignment must be figured out empyrically (measure with a scope) but it is deterministic and repeatable.

    5) we set up two intermediate compares for TIMER1 and we connect them through PPI to a GPIOTE that will serve as chip-select. When the timer is running, a chip select pulse will be generated once every word_length + 1 bytes, so synchronous with the SPI word transfer. The correct phase and length of such pulse shall be found empyrically so that the chip-select pulse occurs while the "dummy" byte of each word is being transmitted, and the chip-select must fall right before the first bit of the new transfer. It is tricky to set up but it is deterministic and repeatable.

    6) we set up a TIMER2 as counter connected to TIMER1 COMPARE0 event through PPI. This counter will count the number of samples that were transmitted. We set up COMPARE0 of TIMER2 with counter value equal to 84 (buffer_length).

    7) We connect COMPARE0 of TIMER2 to SPI0 START_TASK and we also short it to the CLEAR task of TIMER2. A whole buffer transfer via SPI0-DMA will be triggered every period of TIMER2 (buffer_length) which is EXACTLY the time it takes for the SPI0 to transfer our data buffer.

    8) The last byte that we left off from the SPI0 transfer (point 3) acts as a clearance interval that allows the DMA to reach a full STOP before it is triggered again. Anyway the last byte was a dummy byte so it is ok not to transfer it. So now we see that this extra dummy byte serves a double purpose: it allows some "dead" time between consecutive output words so that we can pulse the chip-select and mark the end of a word and the beginning of the next one, and also allows the SPI0 DMA to stop before it is triggered again.

    At this point we have an endless playback of the 84 sample (buffer_length) buffer content without any CPU intervention. Now we will see how we can actually bring new data to such buffer without skipping a sample.

    9) we assume that most SPI flash memories can be read consecutively with a read-write operation. First, CMD bytes are transmitted to the flash, containing the command and the data address, the data will immediately start streaming out the flash at every following byte transfer. See for example W25Q16JV flash read protocol.

    10) when we allocate the buffer for our output data, we make sure to allocate CMD additional bytes on top of the necessary buffer size for the signal. We MUST allocate these extra bytes BEFORE the beginning of the actual data. This can be achieved by allocating the desired ((buffer_length * (word_length + 1) + CMD) bytes and then configuring the SPI0 DMA transfer to start CMD bytes after the beginning of the buffer.

    11) we set up SPI1 at the maximum frequency that our flash and our system will allow, say 8MHz. We set up a read-write SPI1-DMA transfer WITHOUT POST INCREMENT, starting at the beginning of the buffer and we hold it. The size of such transfer will be ((buffer_length * (word_length + 1) + CMD) bytes as opposed to the output SPI0 that only transfers the (buffer_length * (word_length + 1) - 1)  bytes after the first CMD bytes of the allocated buffer.

    12) we write the necessary read command and the desired  flash address into the first CMD bytes of the buffer (again, there extra bytes must be just before the "data space"

    13) we set up an additional CPMPARE1 for TIMER2 at a counter value that is very close to the end of the buffer, like buffer_length - 2 for instance. We connect COMPARE1 of TIMER2 to SPI1 START_TASK through PPI. Now every time the OUTPUT DMA has almost finished reading the buffer, a new chunk of data will be transferred from flash to the buffer via SPI1. By the time the OUTPUT DMA wraps around and reads the first buffer samples again, new samples will have been written already. By the time the INPUT DMA writes to the end of the buffer, the input DMA will have already wrapped around. So there are no read-write conflicts anywhere.

    The input SPI1 performs a read-write operation. Technically the entire OLD buffer content is transferred TO the flash while the NEW buffer content is being transferred FROM the flash. This is ok because the flash will ignore all bytes transferred TO it except for the first CMD bytes that contain read command and address. This strategy allows to transfer the read command and address TO the flash and read back the data FROM the flash with a single, continuous DMA transfer.

    Note that with such transfer, the first CMD bytes are also overwritten (with zeros most likely) and will have to be manually re-written and updated. This is the only point where the CPU comes in, the rest is all done in hardware. However this is not a problem because the CPU has a very long time (83 samples @ 83.333kHz ~ 1ms) to perform this task within an interrupt

    14) we enable TIMER2 COMPARE0 interrupts. The interrupt handler will add the buffersize in bytes = (buffer_length * (word_length + 1)) to the flash read address, or reset it if the end of the signal has been reached, and will write the flash read command and updated address to the first CMD bytes of the buffer, ready for the next transfer that will be triggered by hardware (TIMER2 COMPARE1).

    Note that this operation (overwriting the CMD bytes) can be done safely right after the input SPI1 transfer has started and has transferred the first CMD bytes. This allows the maximum time for the interrupt to be handled.

    I have tested this for hours with several buffer lengths. with anything above 50 samples I have continuous playback without skipping samples.

  • Thanks for the update! I am working on a new project using DMA so I may be able to leverage some of your technique.

    Derek

Related