NRF5240 External flash reads randomly failing at higher clock rates when bluetooth is running

We are working with an NRF 52840 on a custom board.

We have an qspi AT25SF321B external flash chip, which is rated for a maximum frequency of 108 MHz.

We have noticed, that if we put the sck_frequency for the part above 8-10 MHz, then it works when bluetooth is not enabled, but when bluetooth is enabled, we are getting unflagged read failures, ie flash_read will return 0, but some of the bytes in the buffer will be incorrect.

The read failure rate is significant, We tested by writing 32 * 4k pages of data with the blue tooth off, confirming they read correctly, then turning on the bluetooth and reading the pages again and again on a loop.

Out of reading the 128k 10 times, 3 or 4 of those would contain read failures.

Our work around is to run the part with sck_frequency of 8 MHz, but it would be good to know what is causing the issue, especially as we have many peripherals working at the same time; blueooth, external flash, accelerometer, gps etc etc.

I've included the (hopefully) important parts of the dts below.

We can also see this issue on a 52840dk board with the external flash replaced with the AT25SF321B.

```

&qspi {
    status = "okay";
    pinctrl-0 = <&qspi_periphs_default>;
    pinctrl-1 = <&qspi_periphs_sleep>;
    pinctrl-names = "default", "sleep";

    at25sf321b: at25sf321b@0 {
        compatible = "nordic,qspi-nor";
        reg = <0>;

        size = <DT_SIZE_M(32)>;              // 32 mega bit total size
        jedec-id = [ 1F 87 01  ];
        sfdp-bfp = [
            e5 20 f1 ff  ff ff ff 01  44 eb 08 6b  08 3b 80 bb
            ee ff ff ff  ff ff 00 ff  ff ff 00 ff  0c 20 0f 52
            10 d8 00 ff  32 3a b1 00  84 e6 14 c2  00 01 00 80
            ff ff ff ff  f7 b3 d5 5c  00 06 61 ff  88 10 00 00
            00 00 00 00  00 00 00 00  00 00 00 00  ff ff ff ff  ];
       
        sck-frequency = <DT_FREQ_M(8)>;    // 108 MHz max freq, but 03h op code max feq = 55Mhz, and the actual clock seems to max out at ~30mhz
       

        has-dpd;                             // Deep power down = 0xB9 in datasheet
        t-enter-dpd = < 20000 >;
        t-exit-dpd = < 20000 >;

        partitions {
            compatible = "fixed-partitions";
            #address-cells = <1>;
            #size-cells = <1>;

            slot1_partition: partition@0 {
                label = "image-1";
                reg = <0x0000 0xe0000>;
            };
            external_partition: partition@e0000 {
                reg = <0xe0000 0x320000>;
            };
        };
    };
};
```

pin ctrl
```
    qspi_periphs_default: qspi_periphs_default {
        group1 {
            psels = <NRF_PSEL(QSPI_SCK, 1, 4)>,
                    <NRF_PSEL(QSPI_CSN, 1, 6)>,
                    <NRF_PSEL(QSPI_IO0, 1, 7)>,
                    <NRF_PSEL(QSPI_IO1, 1, 5)>,
                    <NRF_PSEL(QSPI_IO2, 1, 3)>,
                    <NRF_PSEL(QSPI_IO3, 1, 1)>;
        };
    };

    qspi_periphs_sleep: qspi_periphs_sleep {
        group1 {
            psels = <NRF_PSEL(QSPI_SCK, 1, 4)>,
                    <NRF_PSEL(QSPI_CSN, 1, 6)>,
                    <NRF_PSEL(QSPI_IO0, 1, 7)>,
                    <NRF_PSEL(QSPI_IO1, 1, 5)>,
                    <NRF_PSEL(QSPI_IO2, 1, 3)>,
                    <NRF_PSEL(QSPI_IO3, 1, 1)>;
            low-power-enable;
        };
    };
```


  • "some of the bytes in the buffer will be incorrect" Not quite the answer you may be seeking, but start by isolating the "buffer" into a RAM AHB Slave which has no other data stored (no variables, no stack, no heap, nothing). There are  8 RAM AHB Slaves on the nRF5340 Application core, similar to the nRF52840 which has 8 RAM AHB Slaves. If this fixes the issue, identify data which can safely be moved into the same AHB Slave, typically data which is never directly accessed by DMA.

    This fix addresses the problem of Bus Master DMA stall; at high SPI transfer rates a stall caused by a higher-priority Bus Master (such as the RADIO) will lead to corrupted or lost SPIM data. 'cos there's nowhere to put the data, so avoid the stall; Bus Masters (RADIO, SPI, etc) can operate simultaneously without stalls only when using different AHB Slaves..

    // Each bus master is connected to all the slave devices using an interconnection matrix. The bus masters are
    // assigned priorities, which are used to resolve access when two (or more) bus masters request access to
    // the same slave device. When that occurs, the following rules apply:
    // - If two (or more) bus masters request access to the same slave device, the master with the highest
    // priority is granted the access first.
    // - Bus masters with lower priority are stalled until the higher priority master has completed its transaction.
    // To avoid AHB bus contention when using multiple bus masters, follow these guidelines:
    // - Avoid situations where more than one bus master is accessing the same slave.
    // - If more than one bus master is accessing the same slave, make sure that the bus bandwidth is not exhausted.

    Edit: Note by reducing SPIM burst sizes, it might be possible to use RADIO timing events to only use SPIM transfer when no RADIO bus activity is taking place, assuming such BLE activity is the culprit..

  • Hi,

    In addition to the recommendation from @hmolesworth, can you try to implement the suggested workaround for this errata? [244] QSPI: External flash and QSPI returns erroneous data when the SoftDevice is running

    When using nRF Connect SDK, the corresponding function in nRF Connect SDK would be to use the clock control on-off manager, like shown in this sample.

    Best regards,
    Jørgen

  • The problem seems to be that the nrf52840 would work out the wait times based on 108mHz flash speed, but the clock would max out at 32 mHz, so the flash was being driven much slower than the wait was expecting. We ended up running the flash at 8mHz, where it works, but is slower than the potential flash or nrf52840 maximums

Related