This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

SPI double buffering on TX

I have a use case where I need to send a sequence of 2 byte bursts as SPI master. The data is generated on the fly and the algorithm would be able to saturate the 8Mbit/s SPI on nRF52832, so I'd like to optimize the throughput and latency by interleaving the data generation with the communication.

If I use the naive approach (I don't need the incoming data, in fact, MISO is not wired at all) of:

      NRF_SPI0->TXD = data >> 8;
      while(!NRF_SPI0->EVENTS_READY);
      (void)NRF_SPI0->RXD;
      NRF_SPI0->EVENTS_READY = 0x0UL;

      NRF_SPI0->TXD = (uint8_t)data;
      while(!NRF_SPI0->EVENTS_READY);
      (void)NRF_SPI0->RXD;
      NRF_SPI0->EVENTS_READY = 0x0UL;

I am getting about 1.3us between the bytes and thus the overall transfer rate of ~3Mbit/s

Since my bursts are just 2B, following the section 48.1.3 of the nRF52832 PS, I should be able to just dump both bytes to the spi->TXD, but I couldn't get that work reliably. Actually, I can, with an explicit delay, as in:

    [the loop] {
      uint16_t data = produce();
      NRF_SPI0->TXD = data >> 8;
      NRF_SPI0->TXD = (uint8_t)data;
      delay_us(2);
    }

it pretty much works, with the SPI peripheral generating nice back-to-back 16tick transfer and about 0.8us between the bursts (a 2-byte transfer starting about every 2.8us, since I have 2us wait and ~0.8us to generate the next 16 bits), reaching close to 7Mbit/s

The trouble is, I can't find a reliable way to wait before starting the next 16bit burst besides that 2us delay. With the explicit delay, I am wasting time that could have been used to produce() (since 0.8us is the best case and sometimes it takes longer).

In an ideal case, I'd be able to do something like:

[the loop] {
    uint16_t data = produce();
    wait_for_SPI_idle();
    NRF_SPI0->TXD = data >> 8;
    NRF_SPI0->TXD = (uint8_t)data;
}

But no matter how I play with NRF_SPI0->EVENTS_READY, I can't construct a reliable "wait_for_SPI_idle()" or similar functionality.

Any idea?

Parents Reply Children
  • No, but I am not using interrupts either. Do I need to enable the interrupts to get EVENTS_READY properly delivered?

    I doubt, since the sequence of sendByte(); waitForReady(); sendByte(); waitForReady(); (as descrubed in my first code snippet in OP) works reliably and EVENTS_READY is delivered just fine each time.

    The only thing that doesn't work for me is to use the double-buffering, sending two bytes in a row (as suggested by PS, section 48.1.3), and only then checking for finished transaction...

  • Hello,

    The chapter that you linked to (here), if you look at the diagram beneath the text bolk, you can see the events that appear. As you see you will get a READY event for each byte. This means that you can queue one byte on each READY event. If you need to queue two at the time, you must wait for two.

     

    Something like this:

    produce();
    TXD = data >> 8;
    TXD = (uint8_t)data;
    
    while(true)
    {
        produce();
        while(!READY){};
        TXD = data >> 8;
        while(!READY){};
        TXD = (uint8_t)data;
    }

  • Ah, so I always need to be one byte ahead? I'll try that, though I'd expect to be able to let the pipeline drain. Also, don't I need to clear the READY flag between the bytes?

  • Sorry. I forgot that. Yes, you do have to clear the ready event. 

    Reading from the text after the figure:
    "...Therefore, it is important that you always clear the READY event, even if the RXD register and the data that is being received is not used."

     

    Best regards,

    Edvin

     

  • So this is the best code I could get working reliably:

          if (first) {
            // blindly fill the double buffered output
            NRF_SPI0->TXD = col >> 8;
            NRF_SPI0->TXD = (uint8_t)col;
            first = false;
          }  else {
            while(!NRF_SPI0->EVENTS_READY);
            NRF_SPI0->EVENTS_READY = 0;
            (void)NRF_SPI0->RXD;
    
            NRF_SPI0->TXD = col >> 8;
    
            while(!NRF_SPI0->EVENTS_READY);
            NRF_SPI0->EVENTS_READY = 0;
            (void)NRF_SPI0->RXD;
            NRF_SPI0->TXD = (uint8_t)col;        
          }

    I.e. I have to both reset the READY flag and read the RDX register. And do so in that particular order, including the TXD write. It still improves the throughput from 5.7Mbps to about 7Mbps.

    Interestingly enough, my application still works if I comment out the second EVENTS_READY clearing, which improves the performance further, up to 7.5-7.7Mbps, though I am not comfortable going there w/o understanding why that still works.

    I might investigate the EasyDMA approach later, in case it has low-enough overhead when used for back-to-back 2B-only transactions, though I am fine with 7Mbps already.

Related