FOTA with mcuboot & external flash slot-2, scratch algorithm

Hi,

This is more of a bug-report than really a help question, as it can be dealt with.

The device I am developing uses an external flash for storing MCUBoot's slot-2, such FOTA update functionality with LwM2M not slashes the internal flash available for user program code in half. This mechanism is verified to work, my signed application image correctly appears in the external flash chip.

If I intentionally corrupt this image before transmission, verification fails and the bootloader attempts no swap. This makes it highly likely that transmission, storage & reading of the external flash as well as image verification does exactly what it needs do: There's only a swap with a correct image, meaning data is read from external flash.

I swap slot-1 (on nRF internal flash) and slot-2 (on SPI external flash) using the scratch algorithm.

Everything works fine with NCS v3.0.2, but breaks when upgrading to v3.1.0 or v3.1.1.

MCUBoot as provided by NCS v3.0.2 is revision: ae1ee57f (works)
MCUBoot as provided by NCS v3.1.1 is revision: 9b4ae4cb (broken)

I have verified the problem really is with MCUBoot, as checking out (and doing west update & all that) NCS v3.1.1 but with MCUBoot at revision ae1ee57f (as was shipped with NCS v3.0.2) yields me a perfectly working FOTA mechanism.

I'm currently in the process of comparing both revisions but haven't found a smoking gun yet. I will reply below if I find something myself.

I also need find a way for pinning a west module (MCUBoot) to a specific version in ncs/manifest/west.yml or something (my own files) even though ncs/nrf/west.yml (which comes out of a git repo, which west update then uses as starting point for checking out other repo's) has another version listed. If I find an elegant solution for this workaround I will reply below as well.

  • To be more specific on how it fails:

    After the device received the signed application image in slot-2 (external SPI flash), the device reboots. MCUBoot is executed, reads & verifies the image in the external flash (signature checks out), the scratch algorithm is initiated. Swapping of slot-1 & slot-2 gets done.

    By making memory dumps of my nRF's flash I found out data gets corrupted during the scratch operation.

    So the majority part gets written just fine, but at flash offsets 0x2 0000, 0x4 0000, 0x6 0000, 0x8 0000 there remain chunks of data not written. Where not written means they end up being left 0xFF. So resetting flash works, but writing after reset seems to fail.

    The weird part here however is the length of this 0xFF chunk not being equal for each position:

    Sizes at before mentioned locations of memory not getting written are 0x8000, 0x6000, 0x4000, 0x2000. There's a pattern to it I have not yet been able to find a reason for.

    These numbers & symptoms suggest some kind of alignment issue or block-size problem. I have not found the reason yet.

    The result of chunks remaining as 0xFF then makes the signature check fail, after which MCUBoot refuses to boot from slot-1. It however doesn't attempt roll back the situation either, bricking my device.

    Butt to emphasize once more: This happens only with MCUBoot as provided by NCS v3.1.1, which is is revision 9b4ae4cb in Nordic's repo of MCUBoot. The MCUBoot as provided with NCS v3.0.2 (revision ae1ee57f) works fine.

  • Hi,

    Thank you for the detailed report. I have not seen our experienced the partial writes you mentioned for the swap process, but not sure how much test coverage there is for swap with scratch. Maybe as a test it could be worth trying swap *without* scratch which is the default option:

    And since you have narrowed this down to the bootloader, I think this can be a good opportunity to use "git bisect" to find the offending commit.

    Best regards,

    Vidar

  • Hey Vidar,

    Tnx for the hint. I will go into different types of swap now. And do so more testing & verification.

    As for pinning the bootloader to a specific version, what mechanism would you advise?

    Right now my project looks more or less like this:

    myproject/ncs/manifest/west.yml:

    manifest:
      projects:
        - name: sdk-nrf
          path: nrf
          remote: https://github.com/nrfconnect/
          revision: v3.1.1
          import:
            #path-prefix: ncs/
            path-blocklist:
              - test/*
              - modules/lib/gui/*
            name-blocklist:
              - azure-sdk-for-c
              - canopennode
              - cirrus
              - fatfs
              - gui
              - hal_wurthelektronik
              - hostap
              - littlefs
              - matter
              - memfault-firmware-sdk
              - openthread
              - hal_tdk

    So I pull in NCS to ncs/nrf from where west will pull all the other repo's the NCS consists of, by parsing the myproject/ncs/nrf/west.yml file. It is in this file we find:

        - name: mcuboot
          repo-path: sdk-mcuboot
          revision: ncs-v3.1.1
          path: bootloader/mcuboot

    Can I override those values from my own myproject/ncs/manifest/west.yml file somehow?

    I can do it by applying a patch file prior building, so I have a solution, but if there's a nicer way to do it I prefer use that.

  • Hey,

    To work with another revision, you can add "mcuboot" to the name-blocklist to not pull in the mcuboot project specified by the NCS manifest, then instead define a new project named "mcuboot" with the revision you want. E.g., 

    manifest:
      projects:
        - name: sdk-nrf
          path: nrf
          remote: https://github.com/nrfconnect/
          revision: v3.1.1
          import:
            #path-prefix: ncs/
            path-blocklist:
              - test/*
              - modules/lib/gui/*
            name-blocklist:
              - azure-sdk-for-c
              - canopennode
              - cirrus
              - fatfs
              - gui
              - hal_wurthelektronik
              - hostap
              - littlefs
              - matter
              - memfault-firmware-sdk
              - openthread
              - hal_tdk
              - mcuboot
         - name: mcuboot
           remote: May also point to your own fork
           repo-path: sdk-mcuboot
           revision: commit hash or tag
           path: bootloader/mcuboot

    https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/dev_model_and_contributions/managing_code.html#forking_a_repository_of_the_nrf_connect_sdk 

  • Tnx for help Vidar!

    Changing the method from scratch to offset did not fix the issue. It made it worse, as it failed to boot at all.

    I am however keeping the partition layout for SB_CONFIG_MCUBOOT_MODE_SWAP_USING_OFFSET and free myself some program memory by dropping the scratch partition. Then use it all with mcuboot revision ae1ee57f, using your method suggested above.

Related