dfu issues after migrating from ncs2.9.2 to ncs3.0.2

hello Nordic

we are working with nrf52840/nrf52832 with ncs (currently on v3.0.2)

i have noticed that when i try to dfu a merged artifact of my version with the new mcuboot of ncs3.0.2 the device does not work after the dfu at all, it seems bricked no matter if i try to reset it after or even try to re flash the same version of merged file with the same mcuboot of ncs3.0.2 it does not respond .. only after flashing a lower version it returns to work (this part is very strange i think)

in previous version with ncs2.9.2 there was no issues with the dfu ..

under app/
sysbuild.conf
SB_CONFIG_BOOTLOADER_MCUBOOT=y
SB_CONFIG_BOOT_SIGNATURE_TYPE_RSA=y
SB_CONFIG_PARTITION_MANAGER=y
SB_CONFIG_PM_OVERRIDE_EXTERNAL_DRIVER_CHECK=y

sysbuild_swap.conf
SB_CONFIG_MCUBOOT_MODE_SWAP_SCRATCH=y

sysbuild_no_swap.conf
SB_CONFIG_MCUBOOT_MODE_SWAP_SCRATCH=n
SB_CONFIG_BOOT_SIGNATURE_TYPE_ECDSA_P256=y

prj.conf under app/sysbuild/my_app/mcuboot/
CONFIG_FLASH=y
CONFIG_FLASH_xxxx_API=x
CONFIG_MAIN_STACK_SIZE=xxxx

1. any ideas what can be the cause and how can this be fixed ?

2. any idea how to debug the mcuboot using rtt ? i don't want to flash a version via cursor or vscode because then dfu will not work for some other signature reason 

best regards

Ziv

  • hi Vidar

    sorry for the delay in response i waited for the HW check and some test that i run

    Does your HW allow you to power cycle the board so that both the SPI flash and nRF get reset?

    the design is only based on nRF52840 reset control, then no it resets only the nRF, not the SPI flash.

    i can also add the following findings:

    • dfu from device with mcuboot v3.0.2 to an app with v3.0.2 fails

    (but after i flash a merged version of v2.9.2 i i did see the app comes up with the new app version 3.0.2 and after power off power on it came up with the v2.9.2 version (this is strange) however i did not manage to relet this scenario so i am not sure how valid this result is)

    • dfu from device with mcuboot v2.9.2 to an app with v3.0.2 pass
    • dfu from device with mcuboot v3.0.2 to an app with v2.9.2 fails

    pm_static between app v2.9.2 and app v3.0.2 did not change so same pm static

    i added logs to mcuboot and flashed this version (i had to flash it with --pinreset for some reason else it crashed at start) and after dfu +nrfjprog --reset i get this logs 

    00> [00:00:00.250,396] <inf> AUGU_GPIO_POWER: augu_power_resource_init: boost_on_node
    00> [00:00:00.250,793] <inf> AUGU_GPIO_POWER: augu_power_resource_init: hw_id_on_node
    00> [00:00:00.251,220] <inf> AUGU_GPIO_POWER: augu_power_resource_init: sensors_on_node
    00> [00:00:00.251,617] <inf> AUGU_GPIO_POWER: augu_power_resource_init: flash_on_node
    00> [00:00:00.252,044] <inf> AUGU_FLASH_W25N: master bus found: spi@40023000
    00> [00:00:00.252,410] <inf> AUGU_FLASH_W25N: turning flash on.[00:00:03.136,077] <err> os: ***** USAGE FAULT *****
    00> [00:00:03.136,383] <err> os:   Illegal use of the EPSR
    00> [00:00:03.136,688] <err> os: r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0x00000000
    00> [00:00:03.137,115] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0x00000000
    00> [00:00:03.137,542] <err> os:  xpsr:  0x00000000
    00> [00:00:03.137,847] <err> os: Faulting instruction address (r15/pc): 0x00000000
    00> [00:00:03.138,244] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0
    00> [00:00:03.138,641] <err> os: Current thread: 0x20000b20 (unknown)
    00> [00:00:03.139,007] [1;31m<err> os: Halting system

    another strange things that after dfu fails with mcuboot v3.0.2 i hae to flash the device twice with an older version for it to start working again, after first flash of the merged hex with --chiperase trying to reset or power off and power on the device just don't do it .. i wonder why it gets sttuck like this and why the second flashing of the same merged he as the first flash fixed it.

    hope to read you soon

    best regards

    Ziv

  • Hi Ziv,

    I guess the usage fault is likely what prevents the bootloader from booting the app, and not that the bootloader is failing to access the flash, but it is strange that the exception stack frame is all zeroes. Do you have any calls memset() or memcpy() in AUGU_FLASH_W25N or AUGU_GPIO_POWER that could potentially be zeroing the stack? 

    Best regards,

    Vidar

  • any calls memset() or memcpy()

    no such calls in those modules.

    i tried with another version with logs and added stack sentinal  got this after reset after dfu stopped and i re flashed the merged hex:

    00> [1779891635] <inf> mcuboot_util: Image index: 0, Swap type: none
    00> [1779891635] <inf> mcuboot_util: Image index: 0, Swap type: none
    00> [1779891635] <inf> mcuboot_util: Image index: 0, Swap type: none
    00> [1779891635] <inf> SMP_SERVICE: dfu has started. configuring connection watchdog to 10 minutes
    00> [1779891635] <inf> COMM_MNG: ble_connection_config updated!
    00> [1779891635] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7980000, size 655360, blocks_to_erase=5
    00> [1779891635] <inf> mcumgr_img_grp: Erased 0xa0000 bytes of image slot
    00> [1779891635] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a40000, size 131072, blocks_to_erase=1
    00> [1779891635] <inf> mcumgr_img_grp: Erased 0x20000 bytes of image slot trailer
    00> [1779891636] <inf> COMM_MNG: le_param_updated: LE conn param updated: interval 0x0006 latency 0 timeout 400
    00> [1779891636] <inf> COMM_MNG: RSSI is -47
    00> [00:00:00.250,427] <inf> AUGU_GPIO_POWER: augu_power_resource_init: boost_on_node
    00> [00:00:00.250,823] <inf> AUGU_GPIO_POWER: augu_power_resource_init: hw_id_on_node
    00> [00:00:00.251,220] <inf> AUGU_GPIO_POWER: augu_power_resource_init: sensors_on_node
    00> [00:00:00.251,617] <inf> AUGU_GPIO_POWER: augu_power_resource_init: flash_on_node
    00> [00:00:00.252,014] <inf> AUGU_FLASH_W25N: master bus found: spi@40023000
    00> [00:00:00.252,380] <inf> AUGU_FLASH_W25N: turning flash on...
    00> [00:00:00.252,746] <inf> AUGU_FLASH_W25N: flash_w25n_spi_init done
    00> [00:00:00.253,051] <inf> AUGU_FLASH_W25N: spi init ok
    00> [00:00:00.253,356] <inf> AUGU_FLASH_W25N: power on ok, sleeping 50ms
    00> [00:00:00.303,771] <inf> AUGU_FLASH_W25N: reading jedec id (1)
    00> [00:00:00.304,138] <inf> AUGU_FLASH_W25N: reading jedec id (2)
    00> [00:00:00.304,534] <inf> AUGU_FLASH_W25N: flash jedec device id 0xefba21
    00> [00:00:00.306,915] <inf> AUGU_FLASH_W25N: bad block management lut markdown found
    00> [00:00:00.307,281] <inf> AUGU_FLASH_W25N: flash_w25n init done
    00> *** Booting My Application v2.1.0-dev-ae1ee57f3906 ***
    00> *** Using nRF Connect SDK v3.0.2-89ba1294ac9b ***
    00> *** Using Zephyr OS v4.0.99-f791c49f492c ***
    00> [00:00:00.308,227] <inf> mcuboot: Starting bootloader
    00> [00:00:00.312,896] <inf> mcuboot: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    00> [00:00:00.313,385] <inf> mcuboot: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    00> [00:00:00.313,812] <inf> mcuboot: Boot source: primary slot
    00> [00:00:00.319,122] <inf> mcuboot: Image index: 0, Swap type: test
    00> [00:00:03.057,678] <err> os: r0/a1:  0x00000002  r1/a2:  0xf0f0f0f0  r2/a3:  0x200073c8
    00> [00:00:03.058,105] <err> os: r3/a4:  0x00000002 r12/ip:  0x00000001 r14/lr:  0x00008149
    00> [00:00:03.058,532] <err> os:  xpsr:  0x01000000
    00> [00:00:03.058,807] <err> os: Faulting instruction address (r15/pc): 0x00007f7e
    00> [00:00:03.059,204] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
    00> [00:00:03.059,600] <err> os: Current thread: 0x200019a0 (unknown)
    00> [00:00:03.059,936] <err> os: Halting system

    don't know if it helps to understand more

    i also tried increasing the stack for mcuboot CONFIG_MAIN_STACK_SIZE=10240 (it was set to 2k before)

    and i got this after dfu:

     [1779893114] <inf> mcuboot_util: Image index: 0, Swap type: none
    00> [1779893114] <inf> mcuboot_util: Image index: 0, Swap type: none
    00> [1779893114] <inf> mcuboot_util: Image index: 0, Swap type: none
    00> [1779893114] <inf> SMP_SERVICE: dfu has started. configuring connection watchdog to 10 minutes
    00> [1779893114] <inf> COMM_MNG: ble_connection_config updated!
    00> [1779893114] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7980000, size 655360, blocks_to_erase=5
    00> [1779893114] <inf> mcumgr_img_grp: Erased 0xa0000 bytes of image slot
    00> [1779893114] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a40000, size 131072, blocks_to_erase=1
    00> [1779893114] <inf> mcumgr_img_grp: Erased 0x20000 bytes of image slot trailer
    00> [1779893115] <inf> COMM_MNG: le_param_updated: LE conn param updated: interval 0x0006 latency 0 timeout 400
    00> [1779893115] <inf> COMM_MNG: RSSI is -45
    00> [00:00:00.250,396] <inf> AUGU_GPIO_POWER: augu_power_resource_init: boost_on_node
    00> [00:00:00.250,793] <inf> AUGU_GPIO_POWER: augu_power_resource_init: hw_id_on_node
    00> [00:00:00.251,190] <inf> AUGU_GPIO_POWER: augu_power_resource_init: sensors_on_node
    00> [00:00:00.251,617] <inf> AUGU_GPIO_POWER: augu_power_resource_init: flash_on_node
    00> [00:00:00.252,014] <inf> AUGU_FLASH_W25N: master bus found: spi@40023000
    00> [00:00:00.252,380] <inf> AUGU_FLASH_W25N: turning flash on...
    00> [00:00:00.252,716] <inf> AUGU_FLASH_W25N: flash_w25n_spi_init done
    00> [00:00:00.253,051] <inf> AUGU_FLASH_W25N: spi init ok
    00> [00:00:00.253,356] <inf> AUGU_FLASH_W25N: power on ok, sleeping 50ms
    00> [00:00:00.303,771] <inf> AUGU_FLASH_W25N: reading jedec id (1)
    00> [00:00:00.304,138] <inf> AUGU_FLASH_W25N: reading jedec id (2)
    00> [00:00:00.304,534] <inf> AUGU_FLASH_W25N: flash jedec device id 0xefba21
    00> [00:00:00.306,945] <inf> AUGU_FLASH_W25N: bad block management lut markdown found
    00> [00:00:00.307,342] <inf> AUGU_FLASH_W25N: flash_w25n init done
    00> *** Booting My Application v2.1.0-dev-ae1ee57f3906 ***
    00> *** Using nRF Connect SDK v3.0.2-89ba1294ac9b ***
    00> *** Using Zephyr OS v4.0.99-f791c49f492c ***
    00> [00:00:00.308,288] <inf> mcuboot: Starting bootloader
    00> [00:00:00.313,262] <inf> mcuboot: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    00> [00:00:00.313,751] <inf> mcuboot: Scratch: magic=bad, swap_type=0x1, copy_done=0x2, image_ok=0x2
    00> [00:00:00.314,208] <inf> mcuboot: Boot source: primary slot
    00> [00:00:00.319,488] <inf> mcuboot: Image index: 0, Swap type: test
    00> [00:00:03.216,064] <inf> mcuboot: Starting swap using scratch algorithm.
    00> [00:00:03.216,491] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a60000, size 131072, blocks_to_erase=1
    00> [00:00:03.309,570] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a60000, size 131072, blocks_to_erase=1
    00> [00:00:03.858,398] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a40000, size 131072, blocks_to_erase=1
    00> [00:00:03.860,290] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a00000, size 131072, blocks_to_erase=1
    00> [00:00:08.480,316] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a60000, size 131072, blocks_to_erase=1
    00> [00:00:09.029,144] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x79e0000, size 131072, blocks_to_erase=1
    00> [00:00:13.648,193] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a60000, size 131072, blocks_to_erase=1
    00> [00:00:14.197,021] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x79c0000, size 131072, blocks_to_erase=1
    00> [00:00:18.817,993] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a60000, size 131072, blocks_to_erase=1
    00> [00:00:19.366,821] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x79a0000, size 131072, blocks_to_erase=1
    00> [00:00:23.986,083] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7a60000, size 131072, blocks_to_erase=1
    00> [00:00:24.534,912] <inf> AUGU_FLASH_W25N: flash erase to addr: 0x7980000, size 131072, blocks_to_erase=1
    00> [00:00:30.364,501] <inf> mcuboot: Bootloader chainload address offset: 0xf000
    00> [00:00:30.364,898] <inf> mcuboot: Image version: v2.2.3
    00> [00:00:30.365,203] <inf> mcuboot: Jumping to the first image slot
    00> [00:00:30.365,539] <err> os: SPSEL in thread mode does not indicate PSP
    00> [00:00:30.365,539] <err> os: ***** HARD FAULT *****
    00> [00:00:30.365,539] <err> os:   Fault escalation (see below)
    00> [00:00:30.365,539] <err> os: ***** USAGE FAULT *****
    00> [00:00:30.365,539] <err> os:   Illegal use of the EPSR
    00> [00:00:30.365,539] <err> os: r0/a1:  0x20009bc8  r1/a2:  0x00002d75  r2/a3:  0x0000a2ef
    00> [00:00:30.365,539] <err> os: r3/a4:  0x00002d61 r12/ip:  0x00002d61 r14/lr:  0x00002d61
    00> [00:00:30.365,539] <err> os:  xpsr:  0x00000000
    00> [00:00:30.365,539] <err> os: Faulting instruction address (r15/pc): 0x00002d61
    00> [00:00:30.365,539] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0
    00> [00:00:30.365,539] <err> os: Current thread: 0x200019a0 (unknown)
    00> [00:00:30.365,539] [1;31m<err> os: Halting system

    i also understand that mcuboot on ncs3.0.2 is a bit bigger and uses some other encryption that maybe require more stack .. maybe that is related somehow ? 

  • These crash logs are strange and difficult to make sense of:

    and it is not really possible to pinpoint what the problem is based on these alone.

    As a next step, I suggest you compare/diff the generated .config file from your new and old MCUBoot build located in <build dir>/mcuboot/zephyr/ to see if anything stands out. You can also build the bootloader with CONFIG_EXTRA_EXCEPTION_INFO=y to have the handler print out the actual SP and and exec return values. 

    CONFIG_MAIN_STACK_SIZE should be set to 10k by default. Not sure why it was only 2k, but that is too little for the bootloader.

Related