MCUBoot + MCUmgr: Image in the secondary slot is not valid!

Hi!

I am working on a custom board and would like to use an external flash for BLE OTA updates. The FW is based on NCS v1.7.1 and MCUBoot as the bootloader. I would like to use the MX25R64 external flash connected via SPI (not QSPI) for the secondary bootloader slot. 

I started with the SMP Server sample and got it working on an nRF52840DK using QSPI (see NCS 1.7 ota use external flash). I've followed the discusion ncs-external-flash-ota-change-qspi-to-spi to use SPI instead of QSPI. When using the Android 'Device Manager' App in combination with the provided sample application 1207.hello_world_spi_nor_ext_flash.zip everthing works fine. But the update process fails, if I use the MCUmgr CLI interface for the image upload. In this case I get the error message 'Image in the secondary slot is not valid!'. The MCUmgr CLI runs on a Raspberry PI with an nRF52840 Dongle as BLE HCI Interface.

I also tried tried NCS v1.8.0 and got the same error message.


Debugging MCUBoot with Ozone shows that the image signature/hash is incorrect when using MCUmgr CLI for the image upload. Further investigation of the error shows that the integrity check perforrmed in the bootutil_img_validate function fails. In this case the variable hash is invalid, but the content of the variable buf is correct. When I use the Device Manager App to  upload the image, the signature is valid and the update is successful.

  /*
     * Traverse through all of the TLVs, performing any checks we know
     * and are able to do.
     */
    while (true) {
        rc = bootutil_tlv_iter_next(&it, &off, &len, &type);
        if (rc < 0) {
            goto out;
        } else if (rc > 0) {
            break;
        }

        if (type == IMAGE_TLV_SHA256) {
            /*
             * Verify the SHA256 image hash.  This must always be
             * present.
             */
            if (len != sizeof(hash)) {
                rc = -1;
                goto out;
            }
            rc = LOAD_IMAGE_DATA(hdr, fap, off, buf, sizeof(hash));
            if (rc) {
                goto out;
            }

            FIH_CALL(boot_fih_memequal, fih_rc, hash, buf, sizeof(hash));
            if (fih_not_eq(fih_rc, FIH_SUCCESS)) {
                goto out;
            }

            sha256_valid = 1;



Can you give me any advice to fix the error?


Best regards,

Thomas

Parents
  • Hi,

     

    Thank you for providing your test firmware.

    I tried to replicate the issue you are seeing using these commands on the PC side:

     

    sudo /path/to/mcumgr --conntype ble --connstring ctlr_name=hci0,peer_name='Nordic_LBS' image upload zephyr/app_update.bin

    Here's the "image list" after successful upload:

    sudo /path/to/mcumgr --conntype ble --connstring ctlr_name=hci0,peer_name='Nordic_LBS' image list
    Images:
     image=0 slot=0
        version: 0.0.0
        bootable: true
        flags: active confirmed
        hash: 970ee41e6186bebf631aab7aebbc7c7ae0d6b027ac8149f782104c06884673b7
     image=0 slot=1
        version: 0.0.0
        bootable: true
        flags: 
        hash: 24bc32212e7cc352aafc82a5143dd3f79bd9819869c548dacaf41d2e2eddfe11
    Split status: N/A (0)
    

    Now test the second image and see that it is set to "pending" successfully:

    sudo /path/to/mcumgr --conntype ble --connstring ctlr_name=hci0,peer_name='Nordic_LBS' image test 24bc32212e7cc352aafc82a5143dd3f79bd9819869c548dacaf41d2e2eddfe11
    Images:
     image=0 slot=0
        version: 0.0.0
        bootable: true
        flags: active confirmed
        hash: 970ee41e6186bebf631aab7aebbc7c7ae0d6b027ac8149f782104c06884673b7
     image=0 slot=1
        version: 0.0.0
        bootable: true
        flags: pending
        hash: 24bc32212e7cc352aafc82a5143dd3f79bd9819869c548dacaf41d2e2eddfe11
    Split status: N/A (0)
    

    And here's the output after reset where you can see that the timestamp has changed (image tested OK):

    *** Booting Zephyr OS build v2.6.99-ncs1-1  ***
    I: Starting bootloader
    I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    I: Boot source: none
    I: Swap type: none
    I: Bootloader chainload address offset: 0xc000
    *** Booting Zephyr OS build v2.6.99-ncs1-1  ***
    Starting Bluetooth Peripheral LBS example
    build time: Jan 19 2022 09:25:17
    I: 2 Sectors of 4096 bytes
    I: alloc wra: 0, fd0
    I: data wra: 0, 1c
    I: SoftDevice Controller build revision: 
    I: 3f 47 70 8e 81 95 4e 86 |?Gp...N.
    I: 9d d3 a2 95 88 f6 30 0a |......0.
    I: 7f 53 49 fd             |.SI.    
    I: No ID address. App must call settings_load()
    Bluetooth initialized
    Advertising successfully started
    Connected
    I: Swap type: none
    I: Swap type: test
    I: Swap type: test
    Disconnected (reason 19)
    Connected
    Disconnected (reason 19)
    *** Booting Zephyr OS build v2.6.99-ncs1-1  ***
    I: Starting bootloader
    I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
    I: Boot source: none
    I: Swap type: test
    I: Bootloader chainload address offset: 0xc000
    *** Booting Zephyr OS build v2.6.99-ncs1-1  ***
    Starting Bluetooth Peripheral LBS example
    build time: Jan 19 2022 09:33:22
    I: 2 Sectors of 4096 bytes
    I: alloc wra: 0, fd0
    I: data wra: 0, 1c
    I: SoftDevice Controller build revision: 
    I: 3f 47 70 8e 81 95 4e 86 |?Gp...N.
    I: 9d d3 a2 95 88 f6 30 0a |......0.
    I: 7f 53 49 fd             |.SI.    
    I: No ID address. App must call settings_load()
    Bluetooth initialized
    Advertising successfully started
    

     

    Debugging MCUBoot with Ozone shows that the image signature/hash is incorrect when using MCUmgr CLI for the image upload. Further investigation of the error shows that the integrity check perforrmed in the bootutil_img_validate function fails. In this case the variable hash is invalid, but the content of the variable buf is correct. When I use the Device Manager App to  upload the image, the signature is valid and the update is successful.

    If the hash is incorrect, then there's a problem with the integrity of the image. Since you are working with a nRF52840-DK, could you try to erase the flash and QSPI flash?

    nrfjprog -e
    nrfjprog --reset
    nrfjprog -qspieraseall

    Then flash your sample.

     

    Kind regards,

    Håkon

  • Many thanks for the fast response!
    I followed your steps but still get the same error message.

    As a first step, I erased the flash and QSPI flash

    nrfjprog -e
    nrfjprog --reset
    nrfjprog --qspieraseall


    Than I build and flashed the sample using NCS v1.7.1
    west build -p always -b nrf52840dk_nrf52840
    west flash


    I rebuilt the sample application, copied the update binary to a Raspberry Pi and uploaded the image using MCUmgr
    ubuntu@ubuntu:~$ sudo go/bin/mcumgr -c ble_lbs image upload app_update.bin
     200.50 KiB / 200.50 KiB [======================================================================] 100.00% 9.76 KiB/s 20s
    Done

    List the images

    ubuntu@ubuntu:~$ sudo go/bin/mcumgr -c ble_lbs image list
    Images:
     image=0 slot=0
        version: 0.0.0
        bootable: true
        flags: active confirmed
        hash: 30372ead86b3fe06019e4c90d02c2b680bd97195ddc91f20cc5ec68b7516a4a6
     image=0 slot=1
        version: 0.0.0
        bootable: true
        flags:
        hash: 56385894ea602db72850094e158c58afb4925607790ce3d79d64b1024a2a3d3d
    Split status: N/A (0)

    Triggered an image test and reset the device using nrfjprog

    ubuntu@ubuntu:~$ sudo go/bin/mcumgr -c ble_lbs image test 56385894ea602db72850094e158c58afb4925607790ce3d79d64b1024a2a3d
    3d
    Images:
     image=0 slot=0
        version: 0.0.0
        bootable: true
        flags: active confirmed
        hash: 30372ead86b3fe06019e4c90d02c2b680bd97195ddc91f20cc5ec68b7516a4a6
     image=0 slot=1
        version: 0.0.0
        bootable: true
        flags: pending
        hash: 56385894ea602db72850094e158c58afb4925607790ce3d79d64b1024a2a3d3d
    Split status: N/A (0)

    However, testing the image fails with the same error message
    *** Booting Zephyr OS build v2.6.99-ncs1-1  ***
    I: Starting bootloader
    I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    I: Boot source: none
    I: Swap type: none
    I: Bootloader chainload address offset: 0xc000
    *** Booting Zephyr OS build v2.6.99-ncs1-1  ***
    Starting Bluetooth Peripheral LBS example
    build time: Jan 19 2022 13:30:44
    I: 2 Sectors of 4096 bytes
    I: alloc wra: 0, fe8
    I: data wra: 0, 0
    I: SoftDevice Controller build revision: 
    I: 3f 47 70 8e 81 95 4e 86 |?Gp...N.
    I: 9d d3 a2 95 88 f6 30 0a |......0.
    I: 7f 53 49 fd             |.SI.    
    I: No ID address. App must call settings_load()
    Bluetooth initialized
    Advertising successfully started
    Connected
    I: Swap type: none
    Disconnected (reason 19)
    Connected
    I: Swap type: none
    I: Swap type: none
    E: Unable to allocate TX context
    E: Unable to allocate TX context
    E: Unable to allocate TX context
    E: Unable to allocate TX context
    Disconnected (reason 8)
    Connected
    I: Swap type: none
    I: Swap type: none
    Disconnected (reason 19)
    Connected
    I: Swap type: none
    I: Swap type: test
    I: Swap type: test
    Disconnected (reason 19)
    *** Booting Zephyr OS build v2.6.99-ncs1-1  ***
    I: Starting bootloader
    I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
    I: Boot source: none
    I: Swap type: test
    E: Image in the secondary slot is not valid!
    I: Bootloader chainload address offset: 0xc000
    *** Booting Zephyr OS build v2.6.99-ncs1-1  ***
    Starting Bluetooth Peripheral LBS example
    build time: Jan 19 2022 13:30:44
    I: 2 Sectors of 4096 bytes
    I: alloc wra: 0, fd0
    I: data wra: 0, 1c
    I: SoftDevice Controller build revision: 
    I: 3f 47 70 8e 81 95 4e 86 |?Gp...N.
    I: 9d d3 a2 95 88 f6 30 0a |......0.
    I: 7f 53 49 fd             |.SI.    
    I: No ID address. App must call settings_load()
    Bluetooth initialized
    Advertising successfully started

    Did I do something wrong?

    Best regards,

    Thomas

  • Thanks for your response.

    Is there a way to reduce the image transfer speed using a Nordic Device with BLE HCI firmware to make BLE OTA work for our intended production environment? Our Gateway is based on a custom board with a Raspberry PI compute module and a pre-certified nRF52840 module.

    Best regards,
    Thomas

  • Hi Thomas,

     

    As mentioned, this is unfortunately the state of the image transfer at this time. I have inputted this as an internal improvement report. We are continuously trying to improve our deliveries, and speed of the transfer is one of the areas we're trying to improve. However, as of writing this answer (latest stable version NCS v1.8.0), the speed is approx. 3kB/s.

     

    Kind regards,

    Håkon

  • Many thanks for the support.

    Does this mean that at the current state we cannot use a dongle as HCI device for BLE OTA if the secondary bootloader slot of the peripheral device is located on external flash and connected via SPI?

    I have tested the following combinations with MCUmgr CLI and a nordic dongle as HCI device on a Linux system to update a BLE peripheral device:
    - Works: peripheral with secondary slot on internal flash
    - Works: peripheral with secondary slot on external flash (QSPI)
    - Fails: peripheral with secondary slot on external flash (SPI) -> our usecase (example above)

    Our custom senosr node (peripheral device) is already in production and we need a redesign if the update process doesn't work. At the moment we are using the old nRF5 SDK with the included bootloader and are working on porting the application to NCS and Zephyr.

    Best regards,
    Thomas

  • Hi Thomas,

     

    I am sorry, I must have misunderstood the scenario here. Given the speed you got, I assumed that you were referring to nRF5 sdk implementation, but yes; you can get this speed with mcumgr if overwriting the same image, it seems.

    Tom_H said:
    - Fails: peripheral with secondary slot on external flash (SPI) -> our usecase (example above)

    I setup hci_uart with this setup, and tried it locally. I can see that the issue is present on my side as well:

    *** Booting Zephyr OS build v2.6.99-ncs1  ***
    I: Starting bootloader
    I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
    I: Boot source: none
    I: Swap type: test
    E: Image in the secondary slot is not valid!
    I: Bootloader chainload address offset: 0xc000
    *** Booting Zephyr OS build v2.6.99-ncs1  ***
    

     

    The strange part is that if I manually program the "app_update_test.hex" into the primary slot, the hash of the image is equal to when I'm doing the same procedure over-the-air, which indicates that there's a problem with the moving the image from SPI-flash to internal flash. I am not sure where the issue is in your test-application.

     

    Could you try this sample and see if the same issue occurs on your end?

    smp_svr_nrf52840dk_spi_nor.zip

    Note: this one advertises as "Zephyr".

     

    Kind regards,

    Håkon

  • Hi Håkon,

    thanks for the sample code. I am glad you can reproduce the problem. I tested your sample with NCS version v1.8.0 and I see the same issue.

    Another strange behaviour is that the image verification works when using the DeviceManger app or the internal hci device of a Linux PC for the image upload, but fails when using an hci_uart device.

    Best regards,
    Thomas

Reply Children
Related