MCUBoot crash when using multi-image + swap + serial recovery

Hello,

I am encountering a crash in MCUBoot when using the following configuration:

  • MCUBoot as bootloader (SB_CONFIG_BOOTLOADER_MCUBOOT)
  • TFM configuration (build /ns board)
  • Multi-image update (SB_CONFIG_MCUBOOT_UPDATEABLE_IMAGES=2)
  • Swap mode (CONFIG_BOOT_SWAP_USING_MOVE)
  • Serial Recovery in USB ACM (CONFIG_BOOT_SERIAL_CDC_ACM)
  • Auterm sends a firmware or list images.

This issue seems to match an already-identified bug in upstream MCUBoot: https://github.com/mcu-tools/mcuboot/issues/2336

The version of MCUBoot included in the current nRF Connect SDK 3 .1.1 does not contain this fix, and the crash occurs on our side (a null pointer access).

Questions:

  • When does Nordic plan to integrate this upstream MCUBoot fix into the nRF Connect SDK?
  • Is there a known workaround recommended by Nordic until the fix is officially integrated?

Additional issue on the nRF5340 : sometime, the device freezes at startup when CONFIG_BOOT_SERIAL_WAIT_FOR_DFU is set. This behavior looks very similar to an older issue:
MCUBoot does not load application when using CONFIG_BOOT_SERIAL_WAIT_FOR_DFU .

Is this freeze a known issue ?

please find below logs when mcuboot freezes : 

nrf_usbd_common_disable () at ./src/nrf/zephyr/drivers/usb/common/nrf_usbd_common/nrf_usbd_common.c:1225
1225                    while (!NRF_USBD->EVENTS_ENDEPIN[0]) {
(gdb) bt
#0  nrf_usbd_common_disable ()
    at ./src/nrf/zephyr/drivers/usb/common/nrf_usbd_common/nrf_usbd_common.c:1225
#1  0x00006efe in usb_dc_detach ()
    at ./src/nrf/zephyr/drivers/usb/device/usb_dc_nrfx.c:1322
#2  0x00003b96 in usb_disable ()
    at ./src/nrf/zephyr/subsys/usb/device/usb_device.c:1358
#3  0x0000065c in do_boot (rsp=0x2000cf9c <z_main_stack+10172>)
    at ./src/nrf/bootloader/mcuboot/boot/zephyr/main.c:200
#4  main ()
    at ./src/nrf/bootloader/mcuboot/boot/zephyr/main.c:682
(gdb) 

Unplugging the USB cable doesn't solve the boot when we are frozen.

If USB cable is not present during boot : issue doesn't reproduce.

Best Regards,

Rémi Moessner

Parents Reply Children
  • Hello Vidar,

    Thanks for your feedback.

    I don't use multi image for network core, but for nrf7 wifi image. When is  3.2.0 release forecasted ?

    Best Regards,

    Rémi

  • Hello Rémi,

    I see. In that case it's not a problem to select one of the swap algorithms. v3.2.0 is scheduled to be tagged mid December (this is subject to change) so you should expect to see release candidate (v3.2.0-RC1) tagged soon.

    The issue with the "hang" in the boatloader appears to be caused by the workaround added for this errata: https://docs.nordicsemi.com/bundle/errata_nRF5340_Rev1/page/ERR/nRF5340/Rev1/latest/anomaly_340_167.html. A temporary solution could be to uncomment this code and accept the risk of getting the spurious event that will lead to a secure fault exception. I'm still working on understanding why this workaround may failing in your case. Would you say it's hard to reproduce with your code?

    Best regards,

    Vidar 

    EDIT: there does not appear to be any reported issues with this workaround. I checked with the developer. Have you made any code changes to the bootloader that we should be aware of if we need to try replicate this issue on our end? 

  • Hello Vidar

    In that case it's not a problem to select one of the swap algorithms.

    OK. I've created a patch in mcuboot for the moment it works well. I've also detected another issue if
    CONFIG_BOOT_SERIAL_IMG_GRP_IMAGE_STATE & CONFIG_BOOT_SERIAL_IMG_GRP_HASH are set and I flash Slot 0. Indeed boot_set_pending_multi is called and set SECONDARY as pendy although it should not ! I patched this. Are you aware of such issue ?

    Would you say it's hard to reproduce with your code

    On my hardware, I'm not sure to have not seen the issue. But it appears only when : USB cable is wired but there is no serial communication

    Regrads,

    Rémi

  • Hello Rémi, 

    I've also detected another issue if
    CONFIG_BOOT_SERIAL_IMG_GRP_IMAGE_STATE & CONFIG_BOOT_SERIAL_IMG_GRP_HASH are set and I flash Slot 0. Indeed boot_set_pending_multi is called and set SECONDARY as pendy although it should not ! I patched this. Are you aware of such issue

    Is the problem that auterm is trying to mark the secondary slot as pending when the image was uploaded directly to the primary slot? Did you patch something around this code:  https://github.com/nrfconnect/sdk-mcuboot/commit/fac2cabe98d81b4416052990a22e9698d39267a3 ?

    On my hardware, I'm not sure to have not seen the issue. But it appears only when : USB cable is wired but there is no serial communication

    Thanks for confirming. Could you also say if you have made any changes to the bootloader in addition to the patch mentioned above? The developer was asking whether any of the clocks were stopped prior to this call as this could explain the issue.

    Regards,

    Vidar

  • Hello Vidar,

    Sorry, I didn't see your questions

    Is the problem that auterm is trying to mark the secondary slot as pending when the image was uploaded directly to the primary slot? Did you patch something around this code:  https://github.com/nrfconnect/sdk-mcuboot/commit/fac2cabe98d81b4416052990a22e9698d39267a3 ?

    The problem is that boot_serial code ignore that the PRIMARY slot has been written and set the SECONDARY slot as containing an update (and it is not the case ... as we wrote slot 0)

    my patch in boot_serial.c for this is : 

    @@ -648,7 +652,10 @@ bs_set(char *buf, int len)
                     if (rc == 0 && memcmp(hash, img_hash.value, sizeof(hash)) == 0) {
                         /* Hash matches, set this slot for test or confirmation */
                         found = true;
    -                    goto set_image_state;
    +                    if  (slot != BOOT_PRIMARY_SLOT)
    +                        goto set_image_state;
    +                    else
    +                        goto out;
                     }
                 }
             }

    ould you also say if you have made any changes to the bootloader in addition to the patch mentioned above? The developer was asking whether any of the clocks were stopped prior to this call as this could explain the issue.

    No clock modification, just my own modification for both issues i detected. Please find attached the full patch I used (it includes also the null ptr patch).

    boot_serial_nullptr_exception.patch

    Regards,

    Rémi

Related