This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Slow SPI performance

I have 25Q64 SPI flash chip connected to SPI bus. Unfortunately it works too slow even with using IRQ routine via SPI0_TWI0_IRQHandler (everything done like in example spi_master-pca10028). Typical timing diagram: image description

Is this problem in SPI hardware implementation or interrupt routing via softdevice? Can you propose any ways to speed up SPI throughput? Do you have correct double-buffering example with checking if TX buffer ready for 1 or for 2 bytes? As a temporary solution I've use trick with delay:

    for (int i = 0; i<datalen; i++) {
      MEM_SPI->TXD = data[i];
      nrf_delay_us(1);
    }

It works better, but I'm not sure if this way productive and stable enough for production:

image description

If this way Ok, I'll use my own nrf_delay_us with fewer NOP's :)

  • Your expectations are a bit high for an interrupt-driven SPI interface on a 16MHz chip I think. Let's add it up. From that screenshot you're showing about 35µs for 4 transmissions, each one looks like it's about 1µs (so I'm assuming 8Mb/s) so the delay between each one is about 10µs. The softdevice is documented to add 3µs overhead to an open interrupt, so that leaves 7µs. 7µs is 110 or so clock cycles, about 100 instructions at a .9 cycle average for the Cortex M0. If you look at the code for the spi_master it's pretty easy to believe it takes about 100 instructions, although optimised for speed I'd expect it to be a little faster. Have you seen what happens optimised?

    Is double-buffering properly implemented? Sort of. The code does populate TX twice at the start, however if you look at the interrupt code, it only ever writes one new byte (and reads one byte) before exiting. So after the first two quick bytes, it's only going to go as fast as the interrupt handler can fire, add one byte, then return again, at which point it's pretty much guaranteed at the 8Mb/s SPI speed there's another interrupt waiting for it. One change you could make in the spi_master interrupt code is to continually loop while EVENTS_READY is true (remembering to set it to zero each time) and there is still data to send/receive. That would mean one interrupt would, at that kind of SPI speed, probably end up writing most of the data in the buffer as the time taken just to work out whether there is a byte to write and write it is already larger than the time the SPI interface takes to write it over the wire and ask for the next one.

    Your 'solution' isn't a particularly stable one and doesn't really follow the documentation. Yes it's true that the nRF51 manual doesn't say you can't constantly throw bytes at TXD and have them clocked out while completely ignoring the RXD bytes and the EVENTS_READY flag; but neither does it say you can. The nRF51 doesn't seem to have the concept of overflow on the SPI interface and may indeed continue to send bytes but that's not how the docs highly suggest you work. Also a fixed delay like that isn't very good. If you really want to send bytes in a tight loop then something like this which constantly checks the EVENTS_READY to trigger the next byte send and reads the RXD would seem more in accordance with the docs and also give you the max performance. I make that about 10 instructions or so, so it would feed the SPI just about at full speed.

    while( true )
    {
        if( MEM_SPI->EVENTS_READY )
        {
                uint8_t dummy = MEM_SPI->RXD;
                MEM_SPI->EVENTS_READY=0;
                MEM_SPI->TDX = data [i++];
               if( i >= datelen )
                   break;
         }
    }
    

    That sort of code is also what I was suggesting could go in the interrupt handler, to fill as many bytes as possible during one interrupt cycle, then you get the benefit of an interrupt-based and a tight-looped based solution hybrid.

    The basic problem here is you're trying to use a chip with a 16MHz clock to try and keep an SPI running at 8Mb/s, full. If you really want to do that then a tight loop is going to be required as you only get 16 instructions per byte you're trying to clock out to get the next one in the buffer, that's just not very many. You could change the spi_master code a little to get better performance but if you really want to pump data out at that rate, you really want to just loop and write data, the interrupt-based solution doesn't really keep the buffers full at anything above about 1Mb/s.

  • I'm using interrupt driven SPI for an SPI flash and an OLED display. Initially the performance was painfully bad until I started handling multiple bytes per interrupt as long as EVENTS_READY is pending.

    Also, the nrf51 SPI TXD is double buffered, so make sure you prime the buffers with 2 bytes.

    I'm hoping on a future nrf5x chip nordic won't be quite so stingy with the DMA, because something like SPI screams DMA.

  • I would love DMA too - oddly enough I went through this today. I'm also writing OLED driving code (SSD1306) and mine runs at 8MHz and even with a loop which checks EVENT_READY, clears it, reads the RXD byte (required to get another event), writes the next byte to TXD, increments and checks the data pointer, that ends up being a loop of about 60 instructions or 4µs. Release mode optimised it's about 3µs and I don't think the optimiser has done a great job, but even if I hand-coded it I could do only a little better. So the max effective SPI rate is about 3MHz. At that speed the double-buffering doesn't even help, it costs more machine cycles to track whether you have 1 or 2 TXD slots remaining than it does to pretend there's only one. My code is hybrid and does drop into interrupt mode when it loops and finds EVENT_READY isn't set, I find I have to drop to 1MHz SPI before that happens.

  • I settled on 4MHz SPI. I get about 1.493ms per SSD1306 OLED frame (~512bytes).

    The only thing you need to deal with the double buffering is just splitting the tx/rx indexes and initially priming the tx buffer.

    In my SPI interrupt I have a tx only fast path and a rx/tx slower path (not used for the ssd1306), here is the fast path:

        /* tx only fast path */
        while (hw->EVENTS_READY && rxIdx < xfer->nbytes)
        {
            hw->EVENTS_READY = 0; // ack event 
            data = hw->RXD; // dummy receive 
            rxIdx++; // always incr 
            if (txIdx < xfer->nbytes) // more to send? 
                hw->TXD = txBuffer[txIdx++];
        }
    
Related