SPIM: corrupted data by incoming SPIS data?

Hi,

we've run into an issue using the SPI master (SPIM). We got the NRF5340 connected to our FPGA using 2 spi busses. The NRF is SPI  master on one bus and SPI slave on the other. For testing, we're sending quite some transactions (1800/s receives on the slave, 400/s sends on master). Both busses only have MOSI, no MISO (so both are one-way). We use interrupts to notify the main code only, all SPIM transactions are started outside interrupts.

We always send/receive 7 bytes frames. Since we're testing, we also send known content (a header + cmd + sequence number + trailer). What we see if that sometimes (in order of 1:100.000) that the SPI master 'forgets' to send the last clock cycle of a transaction. So instead of sending 56 clock cycles (=7 bytes * 8)), it only sends 55. Other times it add 7 cycles.

The top 3 (Red/Brown/white) form the SPI_master's bus. The bottom 2 (missing CS there because we needed pin on logic analyser for other signal) show the incoming SPI transaction from the FPGA. The logic analyser marks all the rising flanks on the RED clock where it samples. The last byte only has 7 bits, so they are not marked.

Zooming in on the top-left part:

So the signal should only change on the falling edges of the clock and be samples (by the FPGA/logic analyser) on the rising edges. We expect the data to be C9 12 at start, but it's not. The small brown column at 19 usec is weird! It's only half a clock cycle long and the signal changes during the rising-edge of the clock?!?!

I suspect that's where the missing last clock cycle/bit gets 'eaten'. The 2nd byte is indeed shifted 1 bit (12 -> 24).

We have several different captures that show this behaviour.

We upgraded to NRFX 3.3 (latest), this seems to lower the frequence of this happening, but it still does. Disabling BLE did not change anything.

I suspect that sometimes, an incoming SPI slave transaction corrupts the data/state of the SPI master? But this is speculation.

Since the DevZone support was really helpful in solving our other issues, i hope you can repeat that again Slight smile

  • Hi,

    I have a few initial questions to understand more about the issue:

    • Do you see the missing clock cycle and corrupt data only on the logic analyzer, or also from the FPGA?
    • Which SPI mode have you configured on the nRF and the logic analyzer?
    • Which sampling rate are you using on the logic analyzer?

    The above is to try to establish if the issue is on the nRF side or if it could be something witht the test setup.

    If the problem is on the nRF side, I wonder if you are able to reproduce if you test with constant latency mode? If you run this from the app core, you can test that by making this call early in your application:

    NRF_POWER_NS->TASKS_CONSTLAT = 1; 

    I suspect that sometimes, an incoming SPI slave transaction corrupts the data/state of the SPI master? But this is speculation.

    It does no immediately seem likely, but do you have testing to back this up? Are you able to reproduce the issue if not usign SPIS?

  • Hello Einar,

    Thanks for the quick response.

    - The data issue was detected on the FPGA and then analyser on the logic analyser.

    - the mode is CPHA 0, CPOL 0

    - The analyser is sampling at 25 MS/s, so it has an accuracy of around 40 ns.

    I will run the test overnight with the modifications you mentioned. Currently it's happening roughly every 150.000 transfers.

    I'll update this post with the results..

  • Worth a try simply moving the SPIM data buffers for the two links onto separate RAM AHB Slave memory busses as bus stall is a significant issue with high-speed simultaneous SPIM transfers, indeed any concurrent high-speed bus activity using DMA. Simply put force the buffers into two different memory areas connected by different AHB bus; there are 8 RAM AHB Slaves on the nRF5340 Application core. Bus Masters (RADIO, SPI, etc) can operate simultaneously without stalls only when using different AHB Slaves.

    I discussed this recently here: external-flash-reads-randomly-failing-at-higher-clock-rates

  • Thanks for the advice!

    I have tried having the SPIM and SPIS use 2 separate AHV Slave regions, but unfortunately, the issue still occured.

  • I ran some tests with changes you advised:

    - I tried without BLE, still occured

    - I swapped SPIM3/SPIS2 with SPIM2/SPIS3, no effect, still occured.

    - I tried the memory regions mentioned, no change

    - I ran the test with CONSTLAT enabled. The test ran for around 500K transfers, then the issue occured again. I tried this once, because the test takes quite some time then, so a cautious conclusion would be then this does seems to help a bit.

    One more test I can still do, is to leave out the SPIS. That would mean changing the FPGA image and the setup since we test for timeouts by checking if a reply has been received over the SPIS. I could substitute this with a GPIO interrupt/poll to see if there was an error and just send on a timer.

Related