OTA DFUs no longer working after recreating build configuration

I've previously successfully performed OTA DFUs of our application (basically a heavily customized peripheral_uart), running on hardware substantially similar to the nRF5340 DK. I used the nRF Connect mobile app on both Android and iOS, with the "Test and Confirm" option, to load different firmware images. Those old builds all still work.

We're building against SDK 2.5.3 with toolchain 2.5.3. Thanks to improvements in the 2.5.x SDKs, all I remember having to do to get OTA DFU working was add CONFIG_BOOTLOADER_MCUBOOT, CONFIG_NCS_SAMPLE_MCUMGR_BT_OTA_DFU, and CONFIG_REBOOT to the config; and call smp_bt_register() in main().

Unfortunately, after deleting and recreating the build configuration (target nrf5340dk_nrf5340_cpuapp_ns), I can't get OTA DFUs to reliably load anymore. Most of the time it doesn't seem to load anything, and functionally, I can see the application code hasn't been replaced.

The only difference I've observed is that merged_domains.hex looks different loaded in the nRF Connect desktop Programmer tool. I know this isn't the OTA file, but I assume if the .hex is laid out differently in memory, the .bin in the .zip is too, and this is probably what's causing my problems. Notably, the "top" of address space contains only one orange "MBR or Application" region for hex files corresponding to OTA builds that don't work, instead of one orange "MBR or Application" and two green "Application" regions for hex files corresponding to OTA builds that do work. In the image below, 0.5 is from before I added DFU support, 0.6–0.9 have support and work, and both 0.10 and 0.11 built with the new build configuration do not work.



Does anyone know what I might be doing wrong in the build configuration that broke OTA DFUs?

Parents
  • Hi Jake, and welcome to DevZone!

    Here you can get answers from our community, we have a lot of skilled people who like to help. In addition Nordic has a team of support engineers, like me, who are assigned tickets and will assist you guys as best we can.

    Alright, I like to start by asking some questions to understand the issue better:

    When you changed the application now, did you reflash the nRF5340? Or is it just DFU on top of the old app?

    Could you check what is inside the .zip file now?

    The top memory is the network core I think, and the small bar there should be the network core bootloader (known as b0n). Looks like maybe b0n is missing in your new build. This is a bit strange though, as that should be added automatically. Can you check your build/hci_rpmsg/ folder and see if you find a folder named b0n in here?

    Regards,
    Sigurd Hellesvik

  • Sigurd,

    I'm trying to make DFUs that can be applied over apps that have originally been cable programmed.

    The zip files for old and new builds both contain only app_update.bin and manifest.json. I don't see any noteworthy differences.

    I have found something interesting, however. After cable programming to 0.10 or 0.11, we can DFU successfully to 0.11 or 0.10, just not 0.6–0.9. The same is true for the earlier version range. If cable programmed to 0.6–0.9, DFU to another 0.6–0.9 version works, just not 0.10 or 0.11.

    I verified the presence of build\hci_rpmsg\, including a zephyr subdirectory and app.hex file.

    The top section shows "Core Name Network" in a tooltip when hovered over. Green bands show "Application", and orange "MBR or Application", so it looks like rather than the bootloader being missing, perhaps the application code is being inadvertently built into the bootloader. Certainly the network core is booting with every version.

    Thanks,
    Jake


  • I have new insight into this issue that resolves one of my original questions.

    Recall, I had OTAs built two ways that were incompatible (versions 0.6–0.9 vs. 0.10–0.11). I'm now able to replicate the 0.6–0.9 style build. What apparently happened is I initially built with CONFIG_NRF53_UPGRADE_NETWORK_CORE=y in prj.conf, then removed that option, and through version 0.9 performed only incremental builds, instead of pristine. The developer who built 0.10 had never enabled the network core option. I recreated my build configuration before building 0.11.

    As I understand, my broken build configuration created a DFU that expected a network core image, but didn't actually contain net_core_app_update.bin in dfu_application.zip (only app_update.bin and manifest.json), so it worked, but then wasn't compatible with later builds that neither expected nor contained the network core update. This also explains why the hex files look the way they do in the programmer, yet the network core wasn't being built with the new build configuration.

    Your most recent response helps a great deal. I think we want to do a simultaneous DFU if possible, but I'm still struggling to get that working without external flash. In the correspondence between Chen and Andreas, eventually Chen gave up and decided to do non-simultaneous DFUs.

    How do I modify the flash partition config to resolve the "Missing partitions?" static assert? Is there some example I can start from?

  • Sigurd,

    Do you have any advice for getting simultaneous DFU working without flash? I still haven't figured out the partition config.

    Thanks,
    Jake

  • I found these two PRs as a workaround from some time ago:

    https://github.com/nrfconnect/sdk-nrf/pull/10060
    https://github.com/nrfconnect/sdk-mcuboot/pull/235

    As you can see there are some limitations in the system to simultaneous multi-image DFU from internal flash.

    Maybe you can try the workaround on your device and and see if that works?

  • Thanks for the response. Those changesets look a little intimidating. Maybe there are some changes we don't need, but I'm seeing a lot of changes that look related to the functionality we want. I don't think we want to maintain a branch of the SDK.

    Are there any plans to support simultaneous DFU without external flash on nRF5430 in a future SDK release?

    For now it looks like our best option is non-simultaneous.

Reply Children
  • JakeM said:

    Are there any plans to support simultaneous DFU without external flash on nRF5430 in a future SDK release?

    Devs link me to https://github.com/nrfconnect/sdk-nrf/pull/19551, so this could work.
    However, we do not have any tests for this, meaning that we can not claim support for the feature.

    So I will just quote our default answer when it comes to timeline questions:

    I am not able to answer questions about our timeline. You can try to ask your local sales representative from Nordic Semiconductor for information about our timeline.

    JakeM said:

    For now it looks like our best option is non-simultaneous.

    Yea seems like it. As always, remember to test FOTA thoroughly on test devices before rolling it out to the field. This is always important of course, but it becomes extra important when you do non-simultaneous instead of simultaneous.

  • We talked to our local sales representative. After not making much progress on non-simultaneous FOTA either, we're planning to get help with the network core changes we intend to make, so those changes can be made before going into production, and try to avoid having to update the network core in the future.

    Since we found an explanation for the original discrepancy between builds, and have a plan for addressing future compatibility concerns, I think this case can be closed.

Related