Guideline for nRF5340 network-core FOTA over MCUmgr (with app-core rollback)

Environment

  • nRF Connect SDK: v3.2.1
  • Zephyr: 4.2.99 (NCS fork)
  • CMake (bundled): 3.21.0
  • Board: nrf5340dk_nrf5340_cpuapp
  • OS: Linux

What already works: application-core FOTA (image 0)

This has worked reliably for a long time:

mcumgr ... image upload -e -n 0 app.signed.bin
mcumgr ... image list          # note new hash in slot 1
mcumgr ... image test <hash>
mcumgr ... reset
mcumgr ... image confirm <hash>

Swap + revert behave exactly as expected.

Goal

Add network-core (hci_ipc) FOTA through MCUmgr in the same way (image 1), without losing the application core's swap/revert.

Config we added

# sysbuild
SB_CONFIG_SECURE_BOOT_NETCORE=y
SB_CONFIG_NETCORE_APP_UPDATE=y
# application prj.conf
CONFIG_MCUMGR_GRP_IMG_UPDATABLE_IMAGE_NUMBER=2

Network-core FOTA steps we follow

mcumgr ... image upload -e -n 1 signed_by_mcuboot_and_b0_hci_ipc.bin
mcumgr ... image list          # image=1 slot=1 appears, secondary magic=good
mcumgr ... image confirm <hash>
mcumgr ... reset               # -> device bricked, only recoverable with: nrfjprog --qspieraseall

What we have tried

In every combination below, staging works fine (image upload -n 1image=1 slot=1, magic=good), but the result is always the same: after image test/confirm + reset, the device bricks (reboot loop, only recoverable with nrfjprog --qspieraseall).

  • A) SECURE_BOOT_NETCORE + NETCORE_APP_UPDATE only (app core keeps BOOT_SWAP_USING_MOVE). → Brick on reset. RTT shows MCUboot validating image 1 OK, then abort() right after boot_verify_slot_dependencies. The net image seems to go through the app-core swap path. We noticed the network-core swap-skip in boot_slots_compatible() (swap_move.c) is guarded by #ifdef PM_S1_ADDRESS, which is undefined in this config.

  • B) Added SECURE_BOOT_APPCORE too (to define PM_S1_ADDRESS). → Build assert PM_S0_SIZE == PM_S1_SIZE; fixed by removing our explicit CONFIG_PM_PARTITION_SIZE_MCUBOOT so PM sizes S0/S1 equally. The device still bricks on reset — now apparently later, possibly during the PCD copy to the network core.

  • We have not switched MCUboot to OVERWRITE_ONLY (used by the NCS ref_smp_svr_ext_flash sample for nRF53), because it removes the application core's revert capability, which we need to keep.

So we seem to be missing a config combination: every variant we tried either bricks immediately or bricks "later", always around the network-core update on reset.

Question

Is there a documented minimal configuration for nRF5340 network-core FOTA over MCUmgr (image 1) that coexists with application-core image swap + revert — i.e. without forcing MCUboot into OVERWRITE_ONLY? Does enabling network-core update with BOOT_SWAP_USING_MOVE on the app core require application-core secure boot (B0), or is there a supported combination we are missing?

A reference set of Kconfig options for this exact case would be ideal.

Parents
  • Hi Selrac, 
    Most likely it's not supported (simultaneously non-rollback and non-simultaneously with rollback). It's a little bit tricky here with the nRF5340 DFU. 

    I would need to check with other colleagues and get back to you. 

  • Ok thanks,
    I have somehow succeed reimplementing the 'boot_perform_update_hook' in 'nrf53_hooks.c':

    1. Custom MCUboot hooks module (sysbuild/mcuboot_nrf53_hooks/)

    • Reimplements boot_perform_update_hook:

      • Image 0 (APP core): BOOT_HOOK_REGULAR → native swap/revert engine left untouched.

      • Image 1 (NET core): performs the overwrite-style transfer itself (secondary QSPI → RAM flash-sim → network_core_update() via PCD to the NET core) and returns 0, so boot_swap_image() never runs for image 1 (the root cause of the brick).

    • Injected only into the MCUboot image via mcuboot_EXTRA_ZEPHYR_MODULES; CONFIG_BOOT_IMAGE_ACCESS_HOOK_NRF5340=n drops the stock file.

    2. Per-core rollback behaviour

    • APP core (image 0): full rollback. image test → reset → image confirm flow; auto-revert if not confirmed.

    • NET core (image 1): no rollback, always permanent (hardware limitation: the APP core cannot read NET-core flash). Operational rules: always image confirm, never image test, and one core per reset cycle.

    The fix: boot_perform_update_hook

    MCUboot provides an official extension point that the stock NCS hooks leave
    unused. In context_boot_go():

    rc = BOOT_HOOK_CALL(boot_perform_update_hook, BOOT_HOOK_REGULAR, ...);
    if (rc == BOOT_HOOK_REGULAR) {
        rc = boot_perform_update(state, &bs);   /* the path that crashes */
    }
    

    Its API contract (bootutil/boot_hooks.h): "retval 0: update was done, skip
    performing the update"
    . Because BOOT_HOOK_REGULAR == 1, returning 0 makes
    the if (rc == BOOT_HOOK_REGULAR) gate above false, so
    boot_perform_update() — and therefore the crashing boot_swap_image() 
    never runs for the net core. nrf53_hooks_custom.c is a copy of the
    stock hooks plus a real implementation of this hook (note it returns 0 on
    every net-core path — success or failure — never a negative error, which
    would itself trip the assert(rc == 0)):

    • Image 0 (application core) → returns BOOT_HOOK_REGULAR: the native
      swap-using-move flow runs untouched. image test + automatic revert on
      failed boot, image confirm to make permanent. Rollback fully
      preserved.
    • Image 1 (network core) → the swap engine is bypassed entirely:
      1. Copy [0, ih_hdr_size + ih_img_size) from the secondary slot (QSPI)
        into the RAM flash simulator (signature was already validated by
        MCUboot's normal flow before the hook runs).
      2. network_core_update(true) → PCD transfer; blocks until the network
        core (b0n) reports completion. This blocks inside the boot path, so a boot
        watchdog (if enabled) must cover the worst-case PCD transfer time — the
        same constraint as the stock overwrite-only post-copy hook.
      3. Invalidate the secondary slot — trailer sector first (clears the
        pending-swap magic), header sector second — so the update is not
        re-applied on the next boot.

    /** @brief Invalidate the image-1 secondary slot so the staged update is not
     *         re-applied on subsequent boots.
     *
     * @param fap Flash area of the image 1 secondary slot.
     */
    static void netcore_secondary_slot_invalidate(const struct flash_area *fap)
    {
    	int rc;
    	size_t fa_size = flash_area_get_size(fap);
    
    	/* Trailer first (clears the pending swap magic written by MCUmgr),
    	 * header second. If power is lost in between, the slot keeps a valid
    	 * header but no trailer magic: MCUboot then sees no pending update and
    	 * boots normally — the stale image is simply ignored.
    	 */
    	rc = flash_area_erase(fap, fa_size - SECONDARY_SLOT_ERASE_UNIT,
    			      SECONDARY_SLOT_ERASE_UNIT);
    	if (rc != 0) {
    		BOOT_LOG_ERR("Net core: trailer erase failed: %d", rc);
    	}
    
    	rc = flash_area_erase(fap, 0, SECONDARY_SLOT_ERASE_UNIT);
    	if (rc != 0) {
    		BOOT_LOG_ERR("Net core: header erase failed: %d", rc);
    	}
    }
    
    /**
     * @brief Apply a pending update: native swap engine for the application core,
     *        overwrite-style PCD transfer for the network core.
     *
     * Provenance — none of this is invented from scratch; it is a synthesis of
     * existing SDK code, reassembled so the redirection happens per-image:
     *   - The hook skeleton (return 0 for the net-core image to skip the regular
     *     update, BOOT_HOOK_REGULAR otherwise) is the official MCUboot extension
     *     pattern from bootloader/mcuboot/boot/zephyr/hooks_sample.c. That sample
     *     returns 0 *without doing any work* — it only demonstrates the mechanism.
     *   - network_core_update() and the fake header/check/swap-state hooks are
     *     verbatim from the stock nrf/modules/mcuboot/hooks/nrf53_hooks.c.
     *   - The net-core body below (read [0, hdr+img) from the secondary slot into
     *     the RAM primary, run PCD, then invalidate the secondary) is original,
     *     but modelled on what the stock overwrite-only path already does across
     *     boot_copy_image() (secondary -> primary copy) and
     *     boot_copy_region_post_hook() (the PCD trigger): it just reassembles that
     *     flow inside the hook so it runs only for image 1, leaving image 0 on the
     *     swap engine.
     *   - netcore_secondary_slot_invalidate() is the only fully custom piece
     *     (overwrite-only re-pends differently, via its progressive-erase logic).
     *
     * @param img_index Image pair index (0 = app core, 1 = network core).
     * @param img_head  Header of the secondary slot (real, read from QSPI).
     * @param area      Flash area of the secondary slot.
     *
     * @retval 0                 Update handled here; boot_perform_update() skipped.
     * @retval BOOT_HOOK_REGULAR App core: run the native swap + revert flow.
     */
    int boot_perform_update_hook(int img_index, struct image_header *img_head,
    		const struct flash_area *area)
    {
    	int rc;
    	uint8_t *mock_flash;
    	size_t mock_size;
    	uint32_t img_total;
    	static const struct device *mock_flash_dev;
    
    	if (img_index != NET_CORE_SECONDARY_IMAGE) {
    		/* Image 0 (application core): native swap + revert flow. */
    		return BOOT_HOOK_REGULAR;
    	}
    
    	mock_flash_dev = DEVICE_DT_GET(DT_NODELABEL(PM_MCUBOOT_PRIMARY_1_DEV));
    	if (!device_is_ready(mock_flash_dev)) {
    		BOOT_LOG_ERR("Net core update: RAM flash device not ready");
    		/* Returning 0 (not an error) lets the device boot; the staged
    		 * image stays pending and the update is retried on next boot.
    		 */
    		return 0;
    	}
    
    	mock_flash = flash_simulator_get_memory(NULL, &mock_size);
    
    	/* PCD only consumes [0, ih_hdr_size + ih_img_size): header (for the
    	 * vtable lookup) plus image body. The TLVs are not needed in RAM —
    	 * signature validation already ran on the secondary slot.
    	 */
    	img_total = img_head->ih_hdr_size + img_head->ih_img_size;
    	if ((img_total > mock_size) || (img_total > flash_area_get_size(area))) {
    		BOOT_LOG_ERR("Net core update: image too large (%u bytes)", img_total);
    		/* Signature-valid but oversized image can never be applied:
    		 * drop it to avoid retrying forever.
    		 */
    		netcore_secondary_slot_invalidate(area);
    		return 0;
    	}
    
    	rc = flash_area_read(area, 0, mock_flash, img_total);
    	if (rc != 0) {
    		BOOT_LOG_ERR("Net core update: secondary slot read failed: %d", rc);
    		/* Likely transient (QSPI). Keep the staged image; retry on
    		 * next boot.
    		 */
    		return 0;
    	}
    
    	BOOT_LOG_INF("Net core update: starting PCD transfer (%u bytes)", img_total);
    	rc = network_core_update(true);
    	if (rc != 0) {
    		BOOT_LOG_ERR("Net core update: PCD transfer failed: %d", rc);
    		/* Keep the staged image pending so the update is retried on
    		 * the next boot (a half-written network core is recovered by
    		 * re-running the same PCD copy).
    		 */
    		return 0;
    	}
    
    	BOOT_LOG_INF("Net core update: done");
    	netcore_secondary_slot_invalidate(area);
    
    	return 0;
    }

     

Reply
  • Ok thanks,
    I have somehow succeed reimplementing the 'boot_perform_update_hook' in 'nrf53_hooks.c':

    1. Custom MCUboot hooks module (sysbuild/mcuboot_nrf53_hooks/)

    • Reimplements boot_perform_update_hook:

      • Image 0 (APP core): BOOT_HOOK_REGULAR → native swap/revert engine left untouched.

      • Image 1 (NET core): performs the overwrite-style transfer itself (secondary QSPI → RAM flash-sim → network_core_update() via PCD to the NET core) and returns 0, so boot_swap_image() never runs for image 1 (the root cause of the brick).

    • Injected only into the MCUboot image via mcuboot_EXTRA_ZEPHYR_MODULES; CONFIG_BOOT_IMAGE_ACCESS_HOOK_NRF5340=n drops the stock file.

    2. Per-core rollback behaviour

    • APP core (image 0): full rollback. image test → reset → image confirm flow; auto-revert if not confirmed.

    • NET core (image 1): no rollback, always permanent (hardware limitation: the APP core cannot read NET-core flash). Operational rules: always image confirm, never image test, and one core per reset cycle.

    The fix: boot_perform_update_hook

    MCUboot provides an official extension point that the stock NCS hooks leave
    unused. In context_boot_go():

    rc = BOOT_HOOK_CALL(boot_perform_update_hook, BOOT_HOOK_REGULAR, ...);
    if (rc == BOOT_HOOK_REGULAR) {
        rc = boot_perform_update(state, &bs);   /* the path that crashes */
    }
    

    Its API contract (bootutil/boot_hooks.h): "retval 0: update was done, skip
    performing the update"
    . Because BOOT_HOOK_REGULAR == 1, returning 0 makes
    the if (rc == BOOT_HOOK_REGULAR) gate above false, so
    boot_perform_update() — and therefore the crashing boot_swap_image() 
    never runs for the net core. nrf53_hooks_custom.c is a copy of the
    stock hooks plus a real implementation of this hook (note it returns 0 on
    every net-core path — success or failure — never a negative error, which
    would itself trip the assert(rc == 0)):

    • Image 0 (application core) → returns BOOT_HOOK_REGULAR: the native
      swap-using-move flow runs untouched. image test + automatic revert on
      failed boot, image confirm to make permanent. Rollback fully
      preserved.
    • Image 1 (network core) → the swap engine is bypassed entirely:
      1. Copy [0, ih_hdr_size + ih_img_size) from the secondary slot (QSPI)
        into the RAM flash simulator (signature was already validated by
        MCUboot's normal flow before the hook runs).
      2. network_core_update(true) → PCD transfer; blocks until the network
        core (b0n) reports completion. This blocks inside the boot path, so a boot
        watchdog (if enabled) must cover the worst-case PCD transfer time — the
        same constraint as the stock overwrite-only post-copy hook.
      3. Invalidate the secondary slot — trailer sector first (clears the
        pending-swap magic), header sector second — so the update is not
        re-applied on the next boot.

    /** @brief Invalidate the image-1 secondary slot so the staged update is not
     *         re-applied on subsequent boots.
     *
     * @param fap Flash area of the image 1 secondary slot.
     */
    static void netcore_secondary_slot_invalidate(const struct flash_area *fap)
    {
    	int rc;
    	size_t fa_size = flash_area_get_size(fap);
    
    	/* Trailer first (clears the pending swap magic written by MCUmgr),
    	 * header second. If power is lost in between, the slot keeps a valid
    	 * header but no trailer magic: MCUboot then sees no pending update and
    	 * boots normally — the stale image is simply ignored.
    	 */
    	rc = flash_area_erase(fap, fa_size - SECONDARY_SLOT_ERASE_UNIT,
    			      SECONDARY_SLOT_ERASE_UNIT);
    	if (rc != 0) {
    		BOOT_LOG_ERR("Net core: trailer erase failed: %d", rc);
    	}
    
    	rc = flash_area_erase(fap, 0, SECONDARY_SLOT_ERASE_UNIT);
    	if (rc != 0) {
    		BOOT_LOG_ERR("Net core: header erase failed: %d", rc);
    	}
    }
    
    /**
     * @brief Apply a pending update: native swap engine for the application core,
     *        overwrite-style PCD transfer for the network core.
     *
     * Provenance — none of this is invented from scratch; it is a synthesis of
     * existing SDK code, reassembled so the redirection happens per-image:
     *   - The hook skeleton (return 0 for the net-core image to skip the regular
     *     update, BOOT_HOOK_REGULAR otherwise) is the official MCUboot extension
     *     pattern from bootloader/mcuboot/boot/zephyr/hooks_sample.c. That sample
     *     returns 0 *without doing any work* — it only demonstrates the mechanism.
     *   - network_core_update() and the fake header/check/swap-state hooks are
     *     verbatim from the stock nrf/modules/mcuboot/hooks/nrf53_hooks.c.
     *   - The net-core body below (read [0, hdr+img) from the secondary slot into
     *     the RAM primary, run PCD, then invalidate the secondary) is original,
     *     but modelled on what the stock overwrite-only path already does across
     *     boot_copy_image() (secondary -> primary copy) and
     *     boot_copy_region_post_hook() (the PCD trigger): it just reassembles that
     *     flow inside the hook so it runs only for image 1, leaving image 0 on the
     *     swap engine.
     *   - netcore_secondary_slot_invalidate() is the only fully custom piece
     *     (overwrite-only re-pends differently, via its progressive-erase logic).
     *
     * @param img_index Image pair index (0 = app core, 1 = network core).
     * @param img_head  Header of the secondary slot (real, read from QSPI).
     * @param area      Flash area of the secondary slot.
     *
     * @retval 0                 Update handled here; boot_perform_update() skipped.
     * @retval BOOT_HOOK_REGULAR App core: run the native swap + revert flow.
     */
    int boot_perform_update_hook(int img_index, struct image_header *img_head,
    		const struct flash_area *area)
    {
    	int rc;
    	uint8_t *mock_flash;
    	size_t mock_size;
    	uint32_t img_total;
    	static const struct device *mock_flash_dev;
    
    	if (img_index != NET_CORE_SECONDARY_IMAGE) {
    		/* Image 0 (application core): native swap + revert flow. */
    		return BOOT_HOOK_REGULAR;
    	}
    
    	mock_flash_dev = DEVICE_DT_GET(DT_NODELABEL(PM_MCUBOOT_PRIMARY_1_DEV));
    	if (!device_is_ready(mock_flash_dev)) {
    		BOOT_LOG_ERR("Net core update: RAM flash device not ready");
    		/* Returning 0 (not an error) lets the device boot; the staged
    		 * image stays pending and the update is retried on next boot.
    		 */
    		return 0;
    	}
    
    	mock_flash = flash_simulator_get_memory(NULL, &mock_size);
    
    	/* PCD only consumes [0, ih_hdr_size + ih_img_size): header (for the
    	 * vtable lookup) plus image body. The TLVs are not needed in RAM —
    	 * signature validation already ran on the secondary slot.
    	 */
    	img_total = img_head->ih_hdr_size + img_head->ih_img_size;
    	if ((img_total > mock_size) || (img_total > flash_area_get_size(area))) {
    		BOOT_LOG_ERR("Net core update: image too large (%u bytes)", img_total);
    		/* Signature-valid but oversized image can never be applied:
    		 * drop it to avoid retrying forever.
    		 */
    		netcore_secondary_slot_invalidate(area);
    		return 0;
    	}
    
    	rc = flash_area_read(area, 0, mock_flash, img_total);
    	if (rc != 0) {
    		BOOT_LOG_ERR("Net core update: secondary slot read failed: %d", rc);
    		/* Likely transient (QSPI). Keep the staged image; retry on
    		 * next boot.
    		 */
    		return 0;
    	}
    
    	BOOT_LOG_INF("Net core update: starting PCD transfer (%u bytes)", img_total);
    	rc = network_core_update(true);
    	if (rc != 0) {
    		BOOT_LOG_ERR("Net core update: PCD transfer failed: %d", rc);
    		/* Keep the staged image pending so the update is retried on
    		 * the next boot (a half-written network core is recovered by
    		 * re-running the same PCD copy).
    		 */
    		return 0;
    	}
    
    	BOOT_LOG_INF("Net core update: done");
    	netcore_secondary_slot_invalidate(area);
    
    	return 0;
    }

     

Children
No Data
Related