SPIM: corrupted data by incoming SPIS data?

Hi,

we've run into an issue using the SPI master (SPIM). We got the NRF5340 connected to our FPGA using 2 spi busses. The NRF is SPI master on one bus and SPI slave on the other. For testing, we're sending quite some transactions (1800/s receives on the slave, 400/s sends on master). Both busses only have MOSI, no MISO (so both are one-way). We use interrupts to notify the main code only, all SPIM transactions are started outside interrupts.

We always send/receive 7 bytes frames. Since we're testing, we also send known content (a header + cmd + sequence number + trailer). What we see if that sometimes (in order of 1:100.000) that the SPI master 'forgets' to send the last clock cycle of a transaction. So instead of sending 56 clock cycles (=7 bytes * 8)), it only sends 55. Other times it add 7 cycles.

The top 3 (Red/Brown/white) form the SPI_master's bus. The bottom 2 (missing CS there because we needed pin on logic analyser for other signal) show the incoming SPI transaction from the FPGA. The logic analyser marks all the rising flanks on the RED clock where it samples. The last byte only has 7 bits, so they are not marked.

Zooming in on the top-left part:

So the signal should only change on the falling edges of the clock and be samples (by the FPGA/logic analyser) on the rising edges. We expect the data to be C9 12 at start, but it's not. The small brown column at 19 usec is weird! It's only half a clock cycle long and the signal changes during the rising-edge of the clock?!?!

I suspect that's where the missing last clock cycle/bit gets 'eaten'. The 2nd byte is indeed shifted 1 bit (12 -> 24).

We have several different captures that show this behaviour.

We upgraded to NRFX 3.3 (latest), this seems to lower the frequence of this happening, but it still does. Disabling BLE did not change anything.

I suspect that sometimes, an incoming SPI slave transaction corrupts the data/state of the SPI master? But this is speculation.

Since the DevZone support was really helpful in solving our other issues, i hope you can repeat that again

Parents

0 Einar Thorsrud over 1 year ago
Hi,

I have a few initial questions to understand more about the issue:

Do you see the missing clock cycle and corrupt data only on the logic analyzer, or also from the FPGA?

Which SPI mode have you configured on the nRF and the logic analyzer?

Which sampling rate are you using on the logic analyzer?

The above is to try to establish if the issue is on the nRF side or if it could be something witht the test setup.

If the problem is on the nRF side, I wonder if you are able to reproduce if you test with constant latency mode? If you run this from the app core, you can test that by making this call early in your application:

NRF_POWER_NS->TASKS_CONSTLAT = 1;

I suspect that sometimes, an incoming SPI slave transaction corrupts the data/state of the SPI master? But this is speculation.

It does no immediately seem likely, but do you have testing to back this up? Are you able to reproduce the issue if not usign SPIS?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Bas van den Berg over 1 year ago in reply to Einar Thorsrud

Hello Einar,

Thanks for the quick response.

- The data issue was detected on the FPGA and then analyser on the logic analyser.

- the mode is CPHA 0, CPOL 0

- The analyser is sampling at 25 MS/s, so it has an accuracy of around 40 ns.

I will run the test overnight with the modifications you mentioned. Currently it's happening roughly every 150.000 transfers.

I'll update this post with the results..
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Bas van den Berg over 1 year ago in reply to hmolesworth

You're right, this capture did have the correct number of clocks. Also a case that's possibly then. I dont see how any analog issue could cause bits to shift at all. Bits may become incorrect etc, but you see how the signal looked. It looked fine. This can only be caused by some digital effect. Looking at the analogue probes makes no sense to me.. Also the signal is 500 KHz, so not some GHz effect or something. We did measure between the FPGA and the Nordic pins, our measure point is roughly halfway.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmolesworth over 1 year ago in reply to Bas van den Berg

I would always look at the analogue signal in detail first; I see jitter on both the clock edge and the data edge on the waveforms posted, and yes it may be irrelevant but I see jitter be a problem on many designs as the SPI ports are often capable of very high clock rates and see jitter as multiple edges.

Might I suggest enabling the MISO on the nRF5340 and simply set to the same pin as the MOSI. This provides a loopback mode (yes it works even if undocumented) and the incoming data can be used to detect misaligned data as the Rx should be identical to the Tx; easy way to log data error events.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Bas van den Berg over 1 year ago in reply to hmolesworth

I used the same setup, now with the MISO pin set to the same pin as MOSI. In the receive irq I check if the TX and RX buffers are the same. After a roughly few minutes, the issue occurs. I did 3 runs, in run 1 and 3 the issue was detected by both the Logic Analyser and the Nordic. In run 2, only the Logic Analyser detected the issue. So TX is the buffer we sent and RX is the buffer we received from the MISO pin.

run1:

TX 7 [C5 12 00 02 CD F5 35 ]
RX 7 [C5 02 00 02 CD F5 35 ]

run 2: (Logic analyser only)

run 3:

TX 7 [C5 12 55 35 20 48 E0 ]
RX 7 [C5 12 55 B5 20 48 E0 ]

Hopefully this helps!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmolesworth over 1 year ago in reply to Bas van den Berg

"In run 2, only the Logic Analyser detected the issue" This would normally indicate that it is a hardware timing issue outside the nRF5340, ie H0H1 or E0E1 is not correctly set depending on which pins are being used. For the Tx, are the E0E1-compatible SPI pins being used? Use a breakpoint to examine the SPI pin settings at the start of transmission to prove this.

In runs 1 and 3 this looks more like the AHB bus contention issue; perhaps repeat this test but examine the map file to ensure the Master Tx, Rx and SPI Slave Rx buffers are all in separate AHB memory regions; the code has to be edited to force that.

Edit: Description slightly confusing for E0E1: "P0.08 - P0.12 Drive configuration E0E1 is available and must be used for TRACE. For 32 Mbps high-speed SPI using SPIM4, drive configuration H0H1 must be used.". Maybe stick to H0H1 but make sure the specific pins indeed support H0H1 as some do not, ie some pins are standard drive S0S1 only.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Bas van den Berg over 1 year ago in reply to hmolesworth

How does the STALLSTAT field work? It seems it's always one, before our init even.

I have run the same tests with the following buffer locations:

spim.rx 0x20020000

spim.tx 0x20060000

spis.rx 0x20070000

spis.tx 0x20074000 - shares same AHB master as spis_rx, but not used (no miso)

Both issues (detected by Logic Analyser + Nordic and on Logic Analyser only) still occur. The frequency of an issue occurring does not appear to be changed...

What might be important is that we run our code from RAM (in different regions than those used by buffers), not from Flash.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Bas van den Berg over 1 year ago in reply to hmolesworth

How does the STALLSTAT field work? It seems it's always one, before our init even.

I have run the same tests with the following buffer locations:

spim.rx 0x20020000

spim.tx 0x20060000

spis.rx 0x20070000

spis.tx 0x20074000 - shares same AHB master as spis_rx, but not used (no miso)

Both issues (detected by Logic Analyser + Nordic and on Logic Analyser only) still occur. The frequency of an issue occurring does not appear to be changed...

What might be important is that we run our code from RAM (in different regions than those used by buffers), not from Flash.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 hmolesworth over 1 year ago in reply to Bas van den Berg

RAM timing behaves differently from FLASH due to zero wait states which impacts bus master traffic, bit of a rabbit hole. Best way to check is simply try executing from FLASH and see if it makes a difference. With RAM there will perhaps be less time for lower-priority bus masters like SPIM and SPIS. STALLSTAT is just an indication; I would clear STALLSTAT inside every interrupt if set and increment a counter to see how often this happens; maybe the counter will match the corrupted bytes counter. "Stall status for EasyDMA RAM accesses. The fields in this register are set to STALL by hardware whenever a stall occurres and can be cleared (set to NOSTALL) by the CPU"

SPIM/SPIS AHB bus priority order is 0-1-2-3; what happens if you switch priority order, ie swap (say) SPIM0/SPIS1 to SPIM1/SPIS0?

documentation/ka001368/1-0
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmolesworth over 1 year ago in reply to Bas van den Berg

Did you make any progress on this? It would be helpful to know if these suggestions are worth following up ..
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Bas van den Berg over 1 year ago in reply to hmolesworth

Sorry it sometimes takes a bit longer I am the entire team, so all tasks land on my shoulders

The STALL bit doesn't seem to go to 0, even after clearing it.

I tried changing the SPIM/SPIS priority, but that didn't change anything in the behaviour.

Switching to running from Flash is not an option, since we need to run from RAM. If the issue doesn't occur than, it's not really usable for us. I'm currently simplifying our application and connecting two nordic boards instead of a nordic board and our fpga. Hopefully you'll be able to reproduce the issue on your side then as well. This will probably take a week or so..
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmolesworth over 1 year ago in reply to Bas van den Berg

Turns out STALLSTAT is only available on SPIM4, pity. "SPIM0/1/2/3 Not supported: ... stalling mechanism during AHB bus contention"
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel