We are doing a DFU over BLE with the second MCUboot partition on an external flash. Some times Zephyr can't access the flash after booting, resulting in a bricked device, no DFU possible (app still running ok). A reflash of the nRF is needed to resolve this. Even more strange: a new power cycle does not help, which is not yet explainable to us. It almost seems as if mcumgr stores the status of the external flash on the internal flash.
The following experiments are done to mitigate the initial problem of Zephyr not "mounting" the flash device.
Some key data:
- NCS v3.1.1 using sysbuild
- nRF52832
- `prj.conf` and `mcuboot.conf` have the same clock, SPI and SPI_NOR config
- Custom board with device tree, no overlays
- we are getting SFDP at runtime to be more flexible with the PCB assembly during production
Some experiments:
- using blocking SPI and no multithreading in `mcuboot.conf` => no improvement
- `CONFIG_BOOT_BANNER=n` and `CONFIG_BOOT_DELAY=100` => no improvement
- early boot delays using `SYS_INIT(additional_boot_delay, PRE_KERNEL_x, 0)` and `k_busy_wait()` => no improvement
- `CONFIG_CLOCK_CONTROL_NRF_K32SRC_XTAL=y` to let Zephyr wait until Xtal is stable => no improvement
- enable logging in `prj.conf` and `mcuboot.conf` **NOT** connecting RTT Viewer => mostly fails
- enable logging in `prj.conf` and `mcuboot.conf` **AND** connecting RTT Viewer => OK
- `CONFIG_ASSERT=y` in `prj.conf` and `mcuboot.conf` => OK
My assumtion would be a timing issue inside the SPI and/or flash drivers of Zephyr.
Relevant devicetree:
```
/ {
// ...
chosen {
zephyr,sram = &sram0;
zephyr,flash = &flash0;
zephyr,code-partition = &slot0_partition;
nordic,pm-ext-flash = &extFlash;
};
};
&spi2 {
compatible = "nordic,nrf-spim";
status = "okay";
anomaly-58-workaround;
cs-gpios = <&gpio0 15 GPIO_ACTIVE_LOW>;
pinctrl-0 = <&spi2_default>;
pinctrl-names = "default";
extFlash: extFlash@0 {
compatible = "jedec,spi-nor";
reg = <0>;
spi-max-frequency = <1000000>;
wp-gpios = <&gpio0 10 GPIO_ACTIVE_LOW>;
quad-enable-requirements = "NONE";
// getting SFDP parameters at runtime (CONFIG_SPI_NOR_SFDP_RUNTIME=y, see zephyr/drivers/flash/Kconfig.nor)
};
};
&pinctrl {
spi2_default: spi2_default {
group1 {
psels = <NRF_PSEL(SPIM_SCK, 0, 11)>,
<NRF_PSEL(SPIM_MOSI, 0, 12)>,
<NRF_PSEL(SPIM_MISO, 0, 13)>;
};
};
};
```
Log when failing:
```
... MCUboot ...
I: Jumping to the first image slot
[00:00:00.201,629] <err> spi_nrfx_spim: Timeout waiting for transfer complete
[00:00:00.201,660] <err> spi_nor: SFDP read failed: -116
*** Booting nRF Connect SDK v3.1.1-e2a97fe2578a ***
*** Using Zephyr OS v4.1.99-ff8f0c579eeb ***
[00:00:00.228,454] USER_APP STUFF ...
```
Logic analyser data when failing:
logic analyser.zip
Comments to the logic analyser data:
- At MCU init the nCS line toggles many times before going high. Also the MISO line has some glitches upon the first read. I don't know if they are driven by the MCU (not yet initialised) or by the flash chip. Or maybe because nCS is not asserted, both MCU and flash are hi-Z. Anyhow these are no real concern to me, because the flash is the only device on the bus.
- In APP boot the SFDP header read succeeds, but the SFDP data read not. This one times out as seen in the log and analyser data. Sometimes some bytes get read before the analysers buffer runs out, then one can see about 200ms of idle lines (nCS still asserted) and then some bytes getting read before nCS is going high.
Are such cases known to you? I have read several devzone entries, but none really describes this (some only partially, also mentioning timing issues with SPIM).