OTA DFUs no longer working after recreating build configuration

I've previously successfully performed OTA DFUs of our application (basically a heavily customized peripheral_uart), running on hardware substantially similar to the nRF5340 DK. I used the nRF Connect mobile app on both Android and iOS, with the "Test and Confirm" option, to load different firmware images. Those old builds all still work.

We're building against SDK 2.5.3 with toolchain 2.5.3. Thanks to improvements in the 2.5.x SDKs, all I remember having to do to get OTA DFU working was add CONFIG_BOOTLOADER_MCUBOOT, CONFIG_NCS_SAMPLE_MCUMGR_BT_OTA_DFU, and CONFIG_REBOOT to the config; and call smp_bt_register() in main().

Unfortunately, after deleting and recreating the build configuration (target nrf5340dk_nrf5340_cpuapp_ns), I can't get OTA DFUs to reliably load anymore. Most of the time it doesn't seem to load anything, and functionally, I can see the application code hasn't been replaced.

The only difference I've observed is that merged_domains.hex looks different loaded in the nRF Connect desktop Programmer tool. I know this isn't the OTA file, but I assume if the .hex is laid out differently in memory, the .bin in the .zip is too, and this is probably what's causing my problems. Notably, the "top" of address space contains only one orange "MBR or Application" region for hex files corresponding to OTA builds that don't work, instead of one orange "MBR or Application" and two green "Application" regions for hex files corresponding to OTA builds that do work. In the image below, 0.5 is from before I added DFU support, 0.6–0.9 have support and work, and both 0.10 and 0.11 built with the new build configuration do not work.



Does anyone know what I might be doing wrong in the build configuration that broke OTA DFUs?

Parents
  • Hi Jake, and welcome to DevZone!

    Here you can get answers from our community, we have a lot of skilled people who like to help. In addition Nordic has a team of support engineers, like me, who are assigned tickets and will assist you guys as best we can.

    Alright, I like to start by asking some questions to understand the issue better:

    When you changed the application now, did you reflash the nRF5340? Or is it just DFU on top of the old app?

    Could you check what is inside the .zip file now?

    The top memory is the network core I think, and the small bar there should be the network core bootloader (known as b0n). Looks like maybe b0n is missing in your new build. This is a bit strange though, as that should be added automatically. Can you check your build/hci_rpmsg/ folder and see if you find a folder named b0n in here?

    Regards,
    Sigurd Hellesvik

  • Sigurd,

    I'm trying to make DFUs that can be applied over apps that have originally been cable programmed.

    The zip files for old and new builds both contain only app_update.bin and manifest.json. I don't see any noteworthy differences.

    I have found something interesting, however. After cable programming to 0.10 or 0.11, we can DFU successfully to 0.11 or 0.10, just not 0.6–0.9. The same is true for the earlier version range. If cable programmed to 0.6–0.9, DFU to another 0.6–0.9 version works, just not 0.10 or 0.11.

    I verified the presence of build\hci_rpmsg\, including a zephyr subdirectory and app.hex file.

    The top section shows "Core Name Network" in a tooltip when hovered over. Green bands show "Application", and orange "MBR or Application", so it looks like rather than the bootloader being missing, perhaps the application code is being inadvertently built into the bootloader. Certainly the network core is booting with every version.

    Thanks,
    Jake


  • 0.6 was the version where I initially added OTA DFU support, by adding CONFIG_BOOTLOADER_MCUBOOT and CONFIG_NCS_SAMPLE_MCUMGR_BT_OTA_DFU and a call to smp_bt_register() in main(). We had previously been cable programming exclusively. The hope was for all subsequent versions to be OTA-able over a cable-programmed 0.6 or later.

    At some point after making 0.9, I deleted and recreated my build configuration and haven't been able to build OTAs compatible with 0.6–0.9 since; however the builds I've been making with the new build configuration are all compatible with each other.

    build/hci_rpmsg does not contain a b0n directory when I make a 0.10–0.11 style build. Unfortunately, I don't have the entire build directory saved for 0.6–0.9, nor do I have ability to recreate them, so I don't know whether they had b0n.

    We're trying to understand what broke compatibility to ensure it doesn't happen again.

  • When you got CONFIG_BOOTLOADER_MCUBOOT enabled on the NRF5340, generally the entwork core bootloader should be automatically enabled.

    Can you check our course over at https://academy.nordicsemi.com/courses/nrf-connect-sdk-intermediate/lessons/lesson-8-bootloaders-and-dfu-fota/?
    Did you do the steps like this to update the nRF5340?

  • I initially implemented FOTA over BLE (as the guide calls it) several months ago, basically following Exercise 3 through step 5.

    Step 6 gets into multi-image, which I hadn't previously attempted. I assume we want that, in case we need to modify the network core in the future. However the example doesn't work, because our hardware doesn't have the external flash.

    Is there an example using SDK >=2.5 that doesn't require external flash? Will getting multi-image to work help resolve the original confusion over incompatible OTAs?

    My ultimate goal is ensuring compatibility with whatever changes we might need to make in the future via OTA. To that end, I think I need to lock down a pm_static.yml, presumably multi-image, but accounting for the lack of external flash.

  • JakeM said:
    To that end, I think I need to lock down a pm_static.yml

    Yes this is very important!

    If partitioning changed between  0.9 and 0.11 this would explain the error you get. However from the flash readouts this does not look like the case so I did not comment on it before.

    JakeM said:
    Is there an example using SDK >=2.5 that doesn't require external flash? Will getting multi-image to work help resolve the original confusion over incompatible OTAs?

     See the answer in NRF5340 MCUBOOT DFU failed: Slot image has no hash TLV . Especially see the restrictions on space with this solution.

  • Am I understanding the correspondence between Chen and Andreas correctly that we need to evaluate space usage to determinate if multi-image or separate OTAs is the best strategy? I'm also not sure any of this is helping with the original issue of OTA compatibility breaking for unknown reasons. Is there somewhere private I can post our complete code to get specific recommendations?

    We're mostly looking for guidance on ensuring compatibility for future updates. I appreciate that we need a pm_static.yml, but exactly how that looks is going to depend on our OTA strategy.

Reply
  • Am I understanding the correspondence between Chen and Andreas correctly that we need to evaluate space usage to determinate if multi-image or separate OTAs is the best strategy? I'm also not sure any of this is helping with the original issue of OTA compatibility breaking for unknown reasons. Is there somewhere private I can post our complete code to get specific recommendations?

    We're mostly looking for guidance on ensuring compatibility for future updates. I appreciate that we need a pm_static.yml, but exactly how that looks is going to depend on our OTA strategy.

Children
  • JakeM said:
    Am I understanding the correspondence between Chen and Andreas correctly that we need to evaluate space usage to determinate if multi-image or separate OTAs is the best strategy?

    Yes.

    So the reason why our samples all use external flash for simultaneous multi-image DFU is that you need a lot of memory to store all slots.

    As you see you will need to split your internal flash into 4 parts:
    mcuboot
    application
    application backup (for DFU)
    netcore backup (for DFU)

    This logically results in less space available for DFU.

    If you do non-simultaneous multi-core DFU you will get only three slots:

    However, with this you cannot update the interface between the application core and network core, so it limits the updatability of the nRF5340 network core.

    JakeM said:
    I'm also not sure any of this is helping with the original issue of OTA compatibility breaking for unknown reasons. Is there somewhere private I can post our complete code to get specific recommendations?

    Create ticket -> Create private ticket is where you go for that.

    JakeM said:
    We're mostly looking for guidance on ensuring compatibility for future updates. I appreciate that we need a pm_static.yml, but exactly how that looks is going to depend on our OTA strategy.

    Compatability for future updates:

    pm_static.yml is the most important here. To make that you can just copy build/partitions.yml when you are happy with your project configuration. This will freeze partitioning from that point onwards.

    Then the only other thing I that is important for future updates is the network core update, which is why we discussed the network core now.
    There are several ways to deal with updating the network core, and they all come with pros and cons. Let me try to summarize these:

    • No Network core DFU
      • Pro: No extra complexity
      • Pro: No extra space requirements
      • Con: Can never update network core. This can put restrictions on later application updates. (For example cannot update to some new BLE feature)
    • Non-simultaneous Network core DFU
      • Pro: No extra space requirements
      • Con: Cannot update interface between network core and application core. This means you must be a bit careful when duing DFU of the network core. Allows for some updates of the network core, but not all.
    • Simultaneous Network core DFU
      • Pro: Lets you update both cores no matter what. Full future-proofing
      • Con: Uses extra space. Will put limitations on current application size. Will also limit how much you can increase application size in future DFUs

    Oh and one more thing I remember that you need to keep in mind. Application size cannot be more than approx 95% of slot size. Ref Known Issue NCSDK-20567: Partitioning limitation with MCUboot swap move.

Related