NRF5340 simultaneous multi-image FOTA DFU: works with initial firmware update, but can't upload any revised new version

I'm trying to use simultaneous FOTA update of both cores with NRF5340DK, and there is a strange behavior with any update with modification - the uploading from mobile app just doesn't start. My setup:

1) NRF5340DK

2) mobile phone with Android NRF connect app for updating via Bluetooth

3) sample project https://github.com/hellesvik-nordic/samples_for_nrf_connect_sdk/tree/main/bootloader_samples/nrf5340/mcuboot_smp_ble_simultaneous  compiled with NRF Connect SDK 2.1.2

After firmware initial flashing with "west flash --force --erase" the firmware works as expected. I'm able to make DFU via Bluetooth with dfu_application.zip from build/zephyr folder - and I can do it multiple times without problems.

But if I change something in firmware (for example a log message), compile a new build and try to make DFU with this the new dfu_application.zip - uploading doesn't start. The NRF Connect Android app immediately disconnects after "Starting DFU". 

What I'm doing wrong? How to make DFU with a modified firmware, and why it works only with initial firmware

Parents
  • I believe I found why this issue happens.

    After many trials I took another NRF5340DK (fortunately I have two), flashed it with the same firmware - and updating works as it should be. Initially I decided that my first DK is broken, but it's not...

    This sample uses external qspi flash for storing update images... I did full qspi erase with

    nrfjprog --qspieraseall

    And now updating of my first DK also works as expected.

    I believe at one moment the external qspi flash was written with wrong data (I tested interruption of the updating process with power off). After that the upgrading functionality became broken

    This leads me to a question: is it possible to handle such situation in any way? Maybe we should make full erase of an external flash before each DFU procedure? Or maybe it's possible to erase qspi with some MCUMGR command?

  • Hi, 

    Apologies for the very late response. I have been out of office due to circumstances I did not plan for when I picked up your case, and I will be until Friday this week.

    A quick (and temporary) answer to your question: I believe there should be a config or something similar that will allow you to erase what is on the external flash before doing the DFU, and my best guess (with the resources I have at hand at this given point in time) is that you will have to erase what is on the flash before doing the DFU. Before I can verify this 100% I need to consult with some of my colleagues, and I will do my best to get a reply that answers and verifies your findings regarding your issue on Friday, no later than Monday. 

    I hope that this prolonged response time has not caused too many issues for you,

    Kind regards,
    Andreas

  • Robert de Brum said:
    Thanks for the input. If it could be device specific, have there been any known fixes for this?

    I am working on getting a larger sample size than us 3 to see if it is device specific. If it indicates that it is so, then I'll investigate it closer and examine if there are any possible fixes.

    Kind regards,
    Andreas

  • Hi,

    I've been out of office due to the christmas holidays here in Norway, but in the meanwhile I've had some colleagues of mine perform some tests and see if they could reproduce the errors on their end. Unfortunately it all seemed to succeed on their ends. 

    Could you attempt to do the following:

    1. Build and flash a clean version of https://github.com/hellesvik-nordic/samples_for_nrf_connect_sdk/tree/main/bootloader_samples/nrf5340/mcuboot_smp_ble_simultaneous
      1. If you have modified the SDK at any point, please try with a clean version of NCS.
    2. Perform DFU with the pre-built zips found in the folder "dfu_zips" in https://github.com/aHaugl/NCS_Simult_DFU_test in sequence or try to follow the steps in the repository yourselves

    If you still observe the same issues, please state so here and I'll follow up on them.

    Kind regards,
    Andreas

  • Hello,

    Thanks so much for your time and efforts on this. I was able to successfully DFU both cores per the sample via Google Pixel 4.

    I think that the first issue that I discovered was that I did not successfully build the sample with ncs version 2.1.2 - I was still using 2.0.0.

    The other issue I am discovering is that while the pixel will successfully DFU every time, the iPad fails to do so - I was only able to DFU once with the iPad, from dfu_zips/3/dfu_application.zip to dfu_zips/1/dfu_application.zip.

  • Robert de Brum said:
    I think that the first issue that I discovered was that I did not successfully build the sample with ncs version 2.1.2 - I was still using 2.0.0.

    A version conflict could explain the issue. The sample should be usable for NCS v2.0.0 up to the latest v2.1.x, and with v2.2.0 it yields some errors due to header changes (not checked what issues yet). So also note that this sample is something that is currently not a official sample we support in the SDK so in its current state it will not be maintained, but rather a sample used as suggestions/illustrations for developers to use in their own applications.

    Robert de Brum said:
    The other issue I am discovering is that while the pixel will successfully DFU every time, the iPad fails to do so - I was only able to DFU once with the iPad, from dfu_zips/3/dfu_application.zip to dfu_zips/1/dfu_application.zip.

    This sounds strange. What do you observe when testing the pre-built dfu_zips on your iPad/where does it fail?

    What we observerd when performing the test with an iOS phone was that the finalizing of the transfer did not show up in the app, but we saw the new firmware when observing the output in a terminal after the app timed out and restarting the device. Could it be the same case with the app version on the iPad?

    Kind regards,
    Andreas

  • Hi,

    I've just tested with my phone (the same phone which I used with experienced issue). It seems all works as expected with your pre-built zip files. 

    I will work with my tests later, and if I will experience my issue, I will report here once again. Can't imagine why it happened...

Reply Children
  • Glad to hear that you got it to work with the samples I built, and I must say it is still strange to me that your builds did not work. I think we can conclude that it is most likely either a version conflict or that something in the SDK is not identical to the pristine SDK I were using when. Let me know if and when you have the possibility to test with a fresh install of NCS!

    One pit fall I think that could have occurred is the order of building the new firmware that could've caused the errors. You can follow the "manual" steps in the test repository I created to see if you've included what you need to for instance add logging to the application.

    Please let me know if you have any additional questions to this topic and/or if you feel that we can close the case for now! 

    Kind regards,
    Andreas

  • Hello,

    I am still working on this for now, but iOS does not work as robustly as the Pixel does yet.

    When I try to DFU from my iPad 9th generation, I can successfully DFU once, but then any subsequent DFU fails. Unfortunately, the plans for the product I am working on require iOS as the main platform for the mobile app that will interface with my firmware on the nrf5340. There will not be an android app for some time, and so getting this to work on iOS is a big priority for me and my customer.

    Note some of these images that occur when I use nrfConnect to DFU.

    This sequence happens on a successful DFU with the 9th Gen iPad. Notice the final images says that the DFU failed, but on reset, the device is running the new image.

    After this "successful" DFU, I will experience this on any subsequent DFU's. The file does not get uploaded and so resetting the device does not prove that the DFU worked.

    On the Pixel, I can DFU any of the .zip files from your samples, in any order, at any time.

    Along these lines, I just opened another ticket in which I am concerned with importing this sample and it's multi-image capability into my firmware project. I am posting the ticket link in case there is any parallel thought or discussion that should be had. devzone.nordicsemi.com/.../multi-image-dfu-project-structure

  • Hi,

    I might be assigned to your new case later today, but in the meanwhile I'll answer you here. We had a similar situation when testing the test-repository on an iOS phone, where the confirmation request timed out after transferring the update zip, but after rebooting the nRF5340 device that received the image we could see that the new firmware were running and that it had not reverted back to the original image.

    So a couple of questions to start with so we can weed out the "simplest" errors:

    1. Are you seeing the initial build, or the updated build after rebooting the nRF device after the image has been sent and the request has timed out?
    2. If you're seeing the initial build, can you verify that you're not using the dfu_application zip that corresponds to the same build that is on the initial device?
      1. If you're using dfu_application generated when building the initial build, or a build without any changes to the code, then the bootloader will not do anything as these applications are identical
      2. If you're using dfu_application generated with a second build, perform dfu and verify that you see the changes you've done to the code in the second build, then attempt to do dfu once more with the same dfu_application generated with the second build (i.e. use the same application image twice in a row), then you will see the same behavior as in item 2a.

    Let me know about these question, and we'll see if we should discuss the seemingly iOS related issue in the other case as we're starting to overstep the (soft) "one-topic-per-case-boundary" if it's not related to abovementioned questions. It is a possibility that there could be an issue with the iOS version of the app.

    Kind regards,
    Andreas

  • Are you seeing the initial build, or the updated build after rebooting the nRF device after the image has been sent and the request has timed out?

    I am seeing the original build after the time out.

    If you're seeing the initial build, can you verify that you're not using the dfu_application zip that corresponds to the same build that is on the initial device?

    I can confirm that I am using a different dfu_application.zip file when the application times out.

    I am working with an app developer that uses your DFU library in his iOS apps, and he mentioned 2 things:

    There has been a case with the nrfConnect app recently with several users mentioning a known bug. He says that the nrfConnect iOS app does not use the latest DFU library code yet that just came out?

    Secondly, I am using a 9th Gen iPad which (I believe) is specified for BLE 4.2. While I don't understand why it would work only sometimes, would BLE 4.2 cause an issue if the app is using BLE 5?

    Thanks for your help!

  • Thank you for verifying the questions regarding the DFU process

    Robert de Brum said:
    There has been a case with the nrfConnect app recently with several users mentioning a known bug. He says that the nrfConnect iOS app does not use the latest DFU library code yet that just came out?

    I've asked the Mobile app team some questions about the issue and your two questions:

    Robert de Brum said:

    Note some of these images that occur when I use nrfConnect to DFU.

    This sequence happens on a successful DFU with the 9th Gen iPad. Notice the final images says that the DFU failed, but on reset, the device is running the new image.

    Regarding the time out message the developers suspect that we're missing a message that should've been sent. The device should reply with a response in both cases. Both the "confirm" request and "validate" (which is "image list" request) should be replied with a notification. They recommend checking if the notification is not sent or lost somewhere, so if possible you could perform a BLE Sniff and see if you can find the messages while performing the DFU.

    It might be the sample I suggested to you that is missing something, so I also recommend you to have a look at the zigbee light switch sample, which the simultanous DFU sample is based upon and attempt to do DFU with that sample to see if you observe the same behavior.

    Robert de Brum said:

    There has been a case with the nrfConnect app recently with several users mentioning a known bug. He says that the nrfConnect iOS app does not use the latest DFU library code yet that just came out?

    nRF Connect for iOS will be released in couple of days, the currently available version may not use the latest library. However, we have "nRF Connect Device Manager" app on AppStore, which is a "sample app for the library" and that one is using by design the latest versions - https://apps.apple.com/us/app/nrf-connect-device-manager/id1519423539. There will be plenty of improvements in the upcoming nRF Connect app for iOS.

    Robert de Brum said:
    Secondly, I am using a 9th Gen iPad which (I believe) is specified for BLE 4.2. While I don't understand why it would work only sometimes, would BLE 4.2 cause an issue if the app is using BLE 5?

    As long as you're using the same BLE features, the version should not matter as the BLE versions accumulate features over the generations and keeps on building on the previous versions while keeping backwards compatability. The only exeption is as I mentioned if you intend to use features only supported on one of two devices, then you will only be able to use one of them. An example is that you can not do coded PHY/2M data transmission between a 4.2 and 5.0 device as that's only supported for BLE 5.x.

    Kind regards,
    Andreas

Related