FOTA with mcuboot & external flash slot-2, scratch algorithm

Hi,

This is more of a bug-report than really a help question, as it can be dealt with.

The device I am developing uses an external flash for storing MCUBoot's slot-2, such FOTA update functionality with LwM2M not slashes the internal flash available for user program code in half. This mechanism is verified to work, my signed application image correctly appears in the external flash chip.

If I intentionally corrupt this image before transmission, verification fails and the bootloader attempts no swap. This makes it highly likely that transmission, storage & reading of the external flash as well as image verification does exactly what it needs do: There's only a swap with a correct image, meaning data is read from external flash.

I swap slot-1 (on nRF internal flash) and slot-2 (on SPI external flash) using the scratch algorithm.

Everything works fine with NCS v3.0.2, but breaks when upgrading to v3.1.0 or v3.1.1.

MCUBoot as provided by NCS v3.0.2 is revision: ae1ee57f (works)
MCUBoot as provided by NCS v3.1.1 is revision: 9b4ae4cb (broken)

I have verified the problem really is with MCUBoot, as checking out (and doing west update & all that) NCS v3.1.1 but with MCUBoot at revision ae1ee57f (as was shipped with NCS v3.0.2) yields me a perfectly working FOTA mechanism.

I'm currently in the process of comparing both revisions but haven't found a smoking gun yet. I will reply below if I find something myself.

I also need find a way for pinning a west module (MCUBoot) to a specific version in ncs/manifest/west.yml or something (my own files) even though ncs/nrf/west.yml (which comes out of a git repo, which west update then uses as starting point for checking out other repo's) has another version listed. If I find an elegant solution for this workaround I will reply below as well.

Parents
  • To be more specific on how it fails:

    After the device received the signed application image in slot-2 (external SPI flash), the device reboots. MCUBoot is executed, reads & verifies the image in the external flash (signature checks out), the scratch algorithm is initiated. Swapping of slot-1 & slot-2 gets done.

    By making memory dumps of my nRF's flash I found out data gets corrupted during the scratch operation.

    So the majority part gets written just fine, but at flash offsets 0x2 0000, 0x4 0000, 0x6 0000, 0x8 0000 there remain chunks of data not written. Where not written means they end up being left 0xFF. So resetting flash works, but writing after reset seems to fail.

    The weird part here however is the length of this 0xFF chunk not being equal for each position:

    Sizes at before mentioned locations of memory not getting written are 0x8000, 0x6000, 0x4000, 0x2000. There's a pattern to it I have not yet been able to find a reason for.

    These numbers & symptoms suggest some kind of alignment issue or block-size problem. I have not found the reason yet.

    The result of chunks remaining as 0xFF then makes the signature check fail, after which MCUBoot refuses to boot from slot-1. It however doesn't attempt roll back the situation either, bricking my device.

    Butt to emphasize once more: This happens only with MCUBoot as provided by NCS v3.1.1, which is is revision 9b4ae4cb in Nordic's repo of MCUBoot. The MCUBoot as provided with NCS v3.0.2 (revision ae1ee57f) works fine.

Reply
  • To be more specific on how it fails:

    After the device received the signed application image in slot-2 (external SPI flash), the device reboots. MCUBoot is executed, reads & verifies the image in the external flash (signature checks out), the scratch algorithm is initiated. Swapping of slot-1 & slot-2 gets done.

    By making memory dumps of my nRF's flash I found out data gets corrupted during the scratch operation.

    So the majority part gets written just fine, but at flash offsets 0x2 0000, 0x4 0000, 0x6 0000, 0x8 0000 there remain chunks of data not written. Where not written means they end up being left 0xFF. So resetting flash works, but writing after reset seems to fail.

    The weird part here however is the length of this 0xFF chunk not being equal for each position:

    Sizes at before mentioned locations of memory not getting written are 0x8000, 0x6000, 0x4000, 0x2000. There's a pattern to it I have not yet been able to find a reason for.

    These numbers & symptoms suggest some kind of alignment issue or block-size problem. I have not found the reason yet.

    The result of chunks remaining as 0xFF then makes the signature check fail, after which MCUBoot refuses to boot from slot-1. It however doesn't attempt roll back the situation either, bricking my device.

    Butt to emphasize once more: This happens only with MCUBoot as provided by NCS v3.1.1, which is is revision 9b4ae4cb in Nordic's repo of MCUBoot. The MCUBoot as provided with NCS v3.0.2 (revision ae1ee57f) works fine.

Children
No Data
Related