SPIM: corrupted data by incoming SPIS data?

Hi,

we've run into an issue using the SPI master (SPIM). We got the NRF5340 connected to our FPGA using 2 spi busses. The NRF is SPI  master on one bus and SPI slave on the other. For testing, we're sending quite some transactions (1800/s receives on the slave, 400/s sends on master). Both busses only have MOSI, no MISO (so both are one-way). We use interrupts to notify the main code only, all SPIM transactions are started outside interrupts.

We always send/receive 7 bytes frames. Since we're testing, we also send known content (a header + cmd + sequence number + trailer). What we see if that sometimes (in order of 1:100.000) that the SPI master 'forgets' to send the last clock cycle of a transaction. So instead of sending 56 clock cycles (=7 bytes * 8)), it only sends 55. Other times it add 7 cycles.

The top 3 (Red/Brown/white) form the SPI_master's bus. The bottom 2 (missing CS there because we needed pin on logic analyser for other signal) show the incoming SPI transaction from the FPGA. The logic analyser marks all the rising flanks on the RED clock where it samples. The last byte only has 7 bits, so they are not marked.

Zooming in on the top-left part:

So the signal should only change on the falling edges of the clock and be samples (by the FPGA/logic analyser) on the rising edges. We expect the data to be C9 12 at start, but it's not. The small brown column at 19 usec is weird! It's only half a clock cycle long and the signal changes during the rising-edge of the clock?!?!

I suspect that's where the missing last clock cycle/bit gets 'eaten'. The 2nd byte is indeed shifted 1 bit (12 -> 24).

We have several different captures that show this behaviour.

We upgraded to NRFX 3.3 (latest), this seems to lower the frequence of this happening, but it still does. Disabling BLE did not change anything.

I suspect that sometimes, an incoming SPI slave transaction corrupts the data/state of the SPI master? But this is speculation.

Since the DevZone support was really helpful in solving our other issues, i hope you can repeat that again Slight smile

Parents
  • Hi,

    I have a few initial questions to understand more about the issue:

    • Do you see the missing clock cycle and corrupt data only on the logic analyzer, or also from the FPGA?
    • Which SPI mode have you configured on the nRF and the logic analyzer?
    • Which sampling rate are you using on the logic analyzer?

    The above is to try to establish if the issue is on the nRF side or if it could be something witht the test setup.

    If the problem is on the nRF side, I wonder if you are able to reproduce if you test with constant latency mode? If you run this from the app core, you can test that by making this call early in your application:

    NRF_POWER_NS->TASKS_CONSTLAT = 1; 

    I suspect that sometimes, an incoming SPI slave transaction corrupts the data/state of the SPI master? But this is speculation.

    It does no immediately seem likely, but do you have testing to back this up? Are you able to reproduce the issue if not usign SPIS?

  • Hello Einar,

    Thanks for the quick response.

    - The data issue was detected on the FPGA and then analyser on the logic analyser.

    - the mode is CPHA 0, CPOL 0

    - The analyser is sampling at 25 MS/s, so it has an accuracy of around 40 ns.

    I will run the test overnight with the modifications you mentioned. Currently it's happening roughly every 150.000 transfers.

    I'll update this post with the results..

  • You're right, this capture did have the correct number of clocks. Also a case that's possibly then. I dont see how any analog issue could cause bits to shift at all. Bits may become incorrect etc, but you see how the signal looked. It looked fine. This can only be caused by some digital effect. Looking at the analogue probes makes no sense to me.. Also the signal is 500 KHz, so not some GHz effect or something. We did measure between the FPGA and the Nordic pins, our measure point is roughly halfway.

  • I would always look at the analogue signal in detail first; I see jitter on both the clock edge and the data edge on the waveforms posted, and yes it may be irrelevant but I see jitter be a problem on many designs as the SPI ports are often capable of very high clock rates and see jitter as multiple edges.

    Might I suggest enabling the MISO on the nRF5340 and simply set to the same pin as the MOSI. This provides a loopback mode (yes it works even if undocumented) and the incoming data can be used to detect misaligned data as the Rx should be identical to the Tx; easy way to log data error events.

  • I used the same setup, now with the MISO pin set to the same pin as MOSI. In the receive irq I check if the  TX and RX buffers are the same. After a roughly few minutes, the issue occurs. I did 3 runs, in run 1 and 3 the issue was detected by both the Logic Analyser and the Nordic. In run 2, only the Logic Analyser detected the issue. So TX is the buffer we sent and RX is the buffer we received from the MISO pin.

    run1:

    TX 7 [C5 12 00 02 CD F5 35 ]
    RX 7 [C5 02 00 02 CD F5 35 ]

    run 2: (Logic analyser only)

    run 3:

    TX 7 [C5 12 55 35 20 48 E0 ]
    RX 7 [C5 12 55 B5 20 48 E0 ]

    Hopefully this helps!

  • "In run 2, only the Logic Analyser detected the issue" This would normally indicate that it is a hardware timing issue outside the nRF5340, ie H0H1 or E0E1 is not correctly set depending on which pins are being used. For the Tx, are the E0E1-compatible SPI pins being used? Use a breakpoint to examine the SPI pin settings at the start of transmission to prove this.

    In runs 1 and 3 this looks more like the AHB bus contention issue; perhaps repeat this test but examine the map file to ensure the Master Tx, Rx and SPI Slave Rx buffers are all in separate AHB memory regions; the code has to be edited to force that.

    Edit: Description slightly confusing for E0E1: "P0.08 - P0.12 Drive configuration E0E1 is available and must be used for TRACE. For 32 Mbps high-speed SPI using SPIM4, drive configuration H0H1 must be used.". Maybe stick to H0H1 but make sure the specific pins indeed support H0H1 as some do not, ie some pins are standard drive S0S1 only.

  • How does the STALLSTAT field work? It seems it's always one, before our init even.

    I have run the same tests with the following buffer locations:

    spim.rx 0x20020000

    spim.tx 0x20060000

    spis.rx 0x20070000

    spis.tx 0x20074000  - shares same AHB master as spis_rx, but not used (no miso)

    Both issues (detected by Logic Analyser + Nordic and on Logic Analyser only) still occur. The frequency of an issue occurring does not appear to be changed...

    What might be important is that we run our code from RAM (in different regions than those used by buffers), not from Flash.

Reply
  • How does the STALLSTAT field work? It seems it's always one, before our init even.

    I have run the same tests with the following buffer locations:

    spim.rx 0x20020000

    spim.tx 0x20060000

    spis.rx 0x20070000

    spis.tx 0x20074000  - shares same AHB master as spis_rx, but not used (no miso)

    Both issues (detected by Logic Analyser + Nordic and on Logic Analyser only) still occur. The frequency of an issue occurring does not appear to be changed...

    What might be important is that we run our code from RAM (in different regions than those used by buffers), not from Flash.

Children
Related