This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

MCUBoot: Image in the primary slot is not valid!

Hi Nordic Team,

I have a custom application running based on nRF NCS 1.5.0 with MCUboot enabled.

I'm having an issue where MCUboot simply doesn't jump to the application.
This comes and goes when I add/remove code from my main application... but with no noticible pattern. I wonder if image size is a factor and thing are overlapping in Flash image...

only using 42% of Flash for app.

1> Memory region Used Size Region Size %age Used
1> FLASH: 207828 B 495104 B 41.98%
1> SRAM: 66368 B 256 KB 25.32%
1> IDT_LIST: 88 B 2 KB 4.30%

Running partition manager report on build...\partitions.yml

I believe this is the standard setup where MCUboot take up to 48K and 2 app banks

flash_primary (0x100000 - 1024kB):
+-------------------------------------------------+
| 0x0: mcuboot (0xc000 - 48kB) |
+---0xc000: mcuboot_primary (0x79000 - 484kB)-----+
| 0xc000: mcuboot_pad (0x200 - 512B) |
+---0xc200: mcuboot_primary_app (0x78e00 - 483kB)-+
| 0xc200: app (0x78e00 - 483kB) |
+-------------------------------------------------+
| 0x85000: mcuboot_secondary (0x79000 - 484kB) |
| 0xfe000: settings_storage (0x2000 - 8kB) |
+-------------------------------------------------+

sram_primary (0x40000 - 256kB):
+--------------------------------------------+
| 0x20000000: sram_primary (0x40000 - 256kB) |
+--------------------------------------------+

I can get more debug but here's some I got via enabling minimal logging via RTT in MCUboot.conf

*** Booting Zephyr OS build v2.4.99-ncs1 ***
I: Starting bootloader
I: det pin = 12 rc = 1
I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: none
E: Image in the primary slot is not valid!
E: fih_rc=1, succ=0
E: Unable to find bootable image

One clue is.... I started a bit of investigation into "merged.hex" and how it is created. I was surprised to see mergehex.py combines 4 different images with argument "--overlap=replace"

C:\Python38\python.exe C:/depot/ncs/zephyr/scripts/mergehex.py -o C:/depot/application/build_nrf52840dk_nrf52840/zephyr/merged.hex --overlap=replace C:/depot/application/build_nrf52840dk_nrf52840/mcuboot/zephyr/zephyr.hex C:/depot/application/build_nrf52840dk_nrf52840/zephyr/mcuboot_primary.hex C:/depot/application/build_nrf52840dk_nrf52840/zephyr/zephyr.hex C:/depot/application/build_nrf52840dk_nrf52840/zephyr/app_signed.hex

When I looked at those 4 images there is a LOT of overlap. Shouldn't the final image just consist of mcuboot image and app_signed?

For example: "app_signed.hex" and "zephyr.hex" are almost identical, they start the same but app_signed.hex has signature at the end which is normal, see below for similarities at start.

This is strange to me why zephyr.hex needs to be included in the mergehex while app_signed has everything as far as the app goes, or is there a piece that I am missing? 

This is just one clue, but I haven't determined if this is causing the MCUboot not jumping to app. In other times when it does work, the overlap/replace redundancies are still in place. So I'm not sure if this is the cause.

Thanks for your assistance in advance, this is blocking progress of development for us.

Parents
  • I forgot to mention, I do a clean/ full rebuild, so I assume all hashes are recalculated properly, no?

    I will do more debugging as you suggested and report back.

    RE: overlap=replace, I didn't change the build to this option, it was already set to that. Is that standard or not?

    I guess I can look at an sample/example and see how merge is done for a standard project.

  • I forgot to mention, I do a clean/ full rebuild, so I assume all hashes are recalculated properly, no?

    I will do more debugging as you suggested and report back.

    Okay, sounds good. I'll be waiting for your report

    RE: overlap=replace, I didn't change the build to this option, it was already set to that. Is that standard or not?

    I guess I can look at an sample/example and see how merge is done for a standard project.

    Yes, it's standard. It will make it such that the app_signed.hex file is used instead of the zephyr.hex file. Check my answer here for a more detailed explanation:

    "merged.hex will contain the files <your sample>\<build folder>\mcuboot\zephyr\zephyr.hex as well as <your sample>\<build folder>\zephyr\app_signed.hex. If you're building a nonsecure application (nrf9160dk_nrf9160ns) the merged hex file will include <your sample>\<build folder>\spm\zephyr\zephyr.hex also. When you program the merged.hex file, it will program all the mentioned hex files at once."

    Best regards,

    Simon

  •  OK So I went and did debugging, originally the hashes were not matching as you guessed. But then adding more debug statements in MCUBoot and/or reducing the CONFIG_MAIN_STACK from 10KB to 4KB led to hashes matching but a new problem arose. It goes all the way to do_boot() but doesn't hit the breakpoint in the main() in app.

    It keep resetting over and over.

    What is the appropriate stack size for MCUboot Main?

    Is there another blog post/question I need to be looking at for MCUboot resetting over and over?

    Am I violating any rules by putting breakpoints in MCUboot? Adding a breakpoint doesn't add contents to protected areas in FLASH does it?

  • The current status of my MCUboot.conf file

    #
    # Copyright (c) 2020 Nordic Semiconductor ASA
    #
    # SPDX-License-Identifier: LicenseRef-Nordic-5-Clause
    #
    CONFIG_SIZE_OPTIMIZATIONS=y
    
    # Disable memory guard to avoid false faults in application after boot
    CONFIG_HW_STACK_PROTECTION=n
    
    CONFIG_SYSTEM_CLOCK_DISABLE=y
    CONFIG_SYSTEM_CLOCK_NO_WAIT=y
    CONFIG_PM=n
    
    CONFIG_MAIN_STACK_SIZE=4096
    CONFIG_MBEDTLS_CFG_FILE="mcuboot-mbedtls-cfg.h"
    CONFIG_DEBUG=y
    
    CONFIG_BOOT_MAX_IMG_SECTORS=256
    CONFIG_BOOT_BOOTSTRAP=n
    
    CONFIG_BOOT_ENCRYPT_RSA=n
    CONFIG_BOOT_SIGNATURE_TYPE_RSA=y
    CONFIG_BOOT_SIGNATURE_TYPE_ECDSA_P256=n
    CONFIG_BOOT_SIGNATURE_KEY_FILE="..\\..\\..\\application\\mcuboot_private.pem"
    #CONFIG_DISABLE_FLASH_PATCH=y //should be used in production for max security. read more about flash patching
    
    # Flash
    CONFIG_FLASH=y
    CONFIG_BOOT_ERASE_PROGRESSIVELY=y
    CONFIG_SOC_FLASH_NRF_EMULATE_ONE_BYTE_WRITE_ACCESS=y
    
    # Logger
    #3 lines below switched to y will enable RTT logging in MCUboot
    CONFIG_LOG_BACKEND_RTT=n
    CONFIG_RTT_CONSOLE=n
    CONFIG_LOG=n
    CONFIG_LOG_MINIMAL=n
    CONFIG_MCUBOOT_LOG_LEVEL_DBG=n
    CONFIG_LOG_BACKEND_UART=n
    CONFIG_UART_CONSOLE=n
    
    #CONFIG_LOG_DEFAULT_LEVEL=4
    #CONFIG_LOG_OVERRIDE_LEVEL=4
    #CONFIG_LOG_MAX_LEVEL=4 
    #CONFIG_LOG_IMMEDIATE=n
    #CONFIG_LOG_PROCESS_THREAD=y
    
    #CONFIG_LOG_DEFAULT_LEVEL=2
    #CONFIG_LOG_MAX_LEVEL=3
    #CONFIG_LOG_PRINTK=y
    #CONFIG_LOG_BACKEND_SHOW_COLOR=n
    #CONFIG_LOG_BACKEND_FORMAT_TIMESTAMP=n
    
    CONFIG_MCUBOOT_SERIAL=y
    CONFIG_BOOT_SERIAL_DETECT_PORT="GPIO_0"
    CONFIG_BOOT_SERIAL_DETECT_PIN=12
    CONFIG_BOOT_SERIAL_DETECT_PIN_VAL=0
    
    

  • Farhang said:
    Is there another blog post/question I need to be looking at for MCUboot resetting over and over?

    Check out the tutorial Device Firmware Update (DFU) with MCUBoot bootloader. I'm not sure if it will help you solve the problem though.

    Farhang said:
    Am I violating any rules by putting breakpoints in MCUboot? Adding a breakpoint doesn't add contents to protected areas in FLASH does it?
    Farhang said:
    It goes all the way to do_boot() but doesn't hit the breakpoint in the main() in app.

     Putting breakpoints in MCUboot should not break any rules. Could you try to set CONFIG_DEBUG_OPTIMIZATIONS=y, then it should be able hit the breakpoints.

    Farhang said:
    What is the appropriate stack size for MCUboot Main?

     I think the default main stack size of 10240 should be good. However you can analyze the stack usage using the Thread Analyzer.

    However, since I'm not entirely sure what's causing the misbehaviour, I think the quickest way of resolving this is if you could provide me with some concrete and detailed steps how to reproduce this, and upload your application if possible (I can make the case private).

    Best regards,

    Simon

Reply
  • Farhang said:
    Is there another blog post/question I need to be looking at for MCUboot resetting over and over?

    Check out the tutorial Device Firmware Update (DFU) with MCUBoot bootloader. I'm not sure if it will help you solve the problem though.

    Farhang said:
    Am I violating any rules by putting breakpoints in MCUboot? Adding a breakpoint doesn't add contents to protected areas in FLASH does it?
    Farhang said:
    It goes all the way to do_boot() but doesn't hit the breakpoint in the main() in app.

     Putting breakpoints in MCUboot should not break any rules. Could you try to set CONFIG_DEBUG_OPTIMIZATIONS=y, then it should be able hit the breakpoints.

    Farhang said:
    What is the appropriate stack size for MCUboot Main?

     I think the default main stack size of 10240 should be good. However you can analyze the stack usage using the Thread Analyzer.

    However, since I'm not entirely sure what's causing the misbehaviour, I think the quickest way of resolving this is if you could provide me with some concrete and detailed steps how to reproduce this, and upload your application if possible (I can make the case private).

    Best regards,

    Simon

Children
  •  

    Could you clarify please, whether or not I need to set the CONFIG_BOOT_SIGNAUTRE_KEY_FILE both in mcuboot.conf and also the application prj.conf?

    I think I'm close to a resolution.

  • Ah, okay. You're not using the default key. I overlooked that in the mcuboot conf file you provided. The mcuboot and the application need to use the same private key, such that mcuboot generates the public key (used to validate the image) from the same private key used to sign the image.

    I think there has been some issues earlier, and that you had to set it in both the application and the mcuboot, but I believe it should be fixed in NCS v1.5.0, and you only need to set it one place using any of the options in the answer below

    https://devzone.nordicsemi.com/f/nordic-q-a/69344/how-to-create-a-mcuboot-image-and-an-application-image-without-modifying-the-sdk/307039#307039 

    I will get back to you tomorrow and give more clarity.

  • I think I have narrowed the issue of hash mismatch down to breakpoints set by SES v5.34a.

    I downloaded the flash image using nrfjprog --memrd after download to the nRF52840dk and then read back immediately after, and also after debug via SES and adding breakpoints. 

    A 2byte word of 0xBE00 gets spread over the flash memory everywhere I add a breakpoint. This causes the hash calculation to not match.

    I can provide a zip package with the entire project privately to you, so you can replicate.

    What is this 0xBE00 word and why is it being added to FLASH? 

    I notice for 1-2 breakpoints the bootloader jumps to app fine but adding more breakpoints (e.g. 4-5), this word appears in flash. I can also tell when it happens as SES opens a "programming" progress window which disappears very quickly when I add these extra breakpoints.

    I cleared up my mcuboot.conf. There is a separate confusing issue of adding KEY_FILE.. 

    CONFIG_BOOT_SIGNATURE_KEY_FILE="mcuboot_private_ed256.pem"
    
    
    # Flash
    CONFIG_FLASH=y
    CONFIG_BOOT_ERASE_PROGRESSIVELY=y
    CONFIG_SOC_FLASH_NRF_EMULATE_ONE_BYTE_WRITE_ACCESS=y
    CONFIG_MAIN_STACK_SIZE=10240
    
    # Logger
    #3 lines below switched to y will enable RTT logging in MCUboot
    CONFIG_LOG_BACKEND_RTT=y
    CONFIG_RTT_CONSOLE=n
    CONFIG_UART_CONSOLE=n
    CONFIG_LOG_BACKEND_UART=n
    CONFIG_LOG=y
    CONFIG_MCUBOOT_LOG_LEVEL_DBG=y
    
    
    CONFIG_MCUBOOT_SERIAL=y
    CONFIG_BOOT_SERIAL_DETECT_PORT="GPIO_0"
    CONFIG_BOOT_SERIAL_DETECT_PIN=12
    CONFIG_BOOT_SERIAL_DETECT_PIN_VAL=0
    
    CONFIG_EXTRA_EXCEPTION_INFO=y
    CONFIG_LOG_OVERRIDE_LEVEL=2


    I made a mistake adding the mcuboot.conf as an OVERLAY file via CMAKE AND providing a relative path. Apparently these 2 don't mix. So I converted to using set(mcuboot_CONF_FILE....)

    I commented out the former method and added the latter as you see below. And I no longer get the "WARNING - using Default KEY" when I open the project via SES.

    if (EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/mcuboot.conf")
      set(mcuboot_CONF_FILE "${CMAKE_CURRENT_LIST_DIR}/mcuboot.conf")
    endif()
    
    #if (EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/mcuboot.conf")
    #    list(APPEND mcuboot_OVERLAY_CONFIG
    #      "${CMAKE_CURRENT_SOURCE_DIR}/mcuboot.conf"
    #      )
    #endif()

    I'm up and running again developing with 1-2 breakpoints max, being aware that adding more is going to affect the flash with 0xBE00 and stuck in bootloader PANIC. 

    But very curious what you will find with the breakpoint issue. Let me know how I can transfer files to you. The entire zip of project is 1.4GB, we have box.com and I can upload there and send you a link in a private message.

    This is the comparison of flash before and after adding breakpoints, you can see 0xBE00 add over the place where I added breakpoints.

  • It seems like 0xBE00 is the thumb instruction for a break point: https://interrupt.memfault.com/blog/cortex-m-breakpoints. However, I still don't understand how the validation for the primary image gets affected by setting breakpoints in MCUboot. Is the before and after comparison done for mcuboot or the primary image?

    It would be nice to get a hold of the application and do some testing myself.

    I have sent you a private message for information on how to share the application

  • I was not able to reproduce this with Ozone. I did the following:

    • Copied /application (the one you sent me over PM) into ncs/nrf/samples
    • Added the following Kconfigs to nrf/samples/application/mcuboot.conf (checked nrf/samples/application/build/mcuboot/zephyr/.config to make sure they were set)

    CONFIG_DEBUG_OPTIMIZATIONS=y
    CONFIG_PM_PARTITION_SIZE_MCUBOOT=0x10000

    • Built and flashed application + mcuboot: west build -b nrf52840dk_nrf52840 && west flash
    • Opened Ozone and chose the file ncs/nrf/samples/application/build/mcuboot/zephyr/zephyr.elf:

    • Set several breakpoints in main.c:

    • And some breakpoitns in loader.c, to see if the primary image was successfully validated

    • Then I stepped through all the breakpoints, and hit the first breakpoint in loader.c but not the other two after, so it was successfully validated. I also saw that the primary image was booted successfully, since I was able to see the log:

    *** Booting Zephyr OS build v2.4.99-ncs2  ***
    [00:00:00.000,366] <inf> adc: nrfx_saadc_init
    [00:00:00.000,396] <inf> adc: nrfx_saadc_channels_config = 0
    [00:00:00.000,427] <inf> adc: nrfx_saadc_advanced_mode_set=0
    [00:00:00.000,427] <inf> adc: nrfx_saadc_buffer_set=0
    [00:00:00.000,427] <inf> adc: nrfx_saadc_buffer_set 2=0
    [00:00:00.000,457] <inf> adc: BUF_REQ= 0
    [00:00:00.000,457] <inf> adc: ppi ass 0
    [00:00:00.000,457] <inf> adc: ppi en 0
    [00:00:00.000,488] <inf> adc: nrfx_timer_init 0
    [00:00:00.000,488] <inf> adc: CC Ticks = 31250
    
    [00:00:00.008,422] <inf> fs_nvs: 2 Sectors of 4096 bytes
    [00:00:00.008,422] <inf> fs_nvs: alloc wra: 0, ff0
    [00:00:00.008,422] <inf> fs_nvs: data wra: 0, 0
    Starting Radar
    i=1
       [00:00:01.000,579] <inf> adc: ADC Task 0
    
    [00:00:01.108,764] <inf> main: Read speed unit = 0,MPH, data rate=1
    i=2
       [00:00:02.000,671] <inf> adc: ADC Task 1
    
    [00:00:02.109,039] <inf> main: Read speed unit = 0,MPH, data rate=1
    i=3
       [00:00:03.000,762] <inf> adc: ADC Task 2
    
    [00:00:03.109,313] <inf> main: Read speed unit = 0,MPH, data rate=1
    

    Can you confirm that I have understood your issue correctly?

    I noticed now that you were using SES, and the issue might be SES related. Are you able to reproduce your issue with Ozone? Is it sufficient to use Ozone or do you want me to test it with SES as well?

    Best regards,

    Simon

Related