NRF Connect SDK FOTA download client bug

Hi,

I have discovered a bug in the fota_download.c file contained in the SDK. I am attempting to cancel the FOTA update separately within my application code, and it goes through most of the time, however I am running into an edge case that causes the thread calling this function to be stuck in a while loop for the duration of the FOTA update. I have investigated and root caused this issue, and it seems to be caused by the resume flag being set right before I call the cancel in my application code. Since the download_with_offset function is scheduled 1s in the future, if I attempt to cancel in this 1s window it leads to the device getting stuck in this loop. This is because the fota_download_cancel assumes that the cancel goes through and waits forever for the downloading flag to be cleared. However, if the resume flag is set, the download handler doesn't call the stopped() function in which the downloading flag is cleared.

As there is no access within the application to check the flags, I don't see a clear workaround for this issue. Pls provide any feedback on how this can be dealt with

Thanks,

Aman

  • Hi,

     

    Could you share a bit more information about your scenario?

    Which NCS version are you using?

    Are you using nRF700x or nRF9160? If using nrf9160, which mfw version is being used?

     

    Kind regards,

    Håkon

  • Hi Håkon,

    I am using NCS version 2.5.0 and I am using a Laird MG100 device, which runs on an NRF52840. This device uses NB-IoT, and since FOTA can be quite slow over this network, we want our device to remain functional during FOTA. To do this, we are having to cancel and resume FOTA anytime that we want to use the network. As a result, we are running into this edge case scenario quite often.

    Pls let me know if any other details are needed

    Thanks,

    Aman

  • Hi Aman,

     

    FOTA download (download_client) isn't tested with external modems, as it is aimed towards running on nRF devices, so there will be timing differences and potential problems that arises.

    I highly suspect that the corner-case for the cancel/resume occurs due to external factors, my main thought is the transport layer (interrupt based UART?) between the nRF and the 3rd party modem. 

    Could you provide any detailed logs / readouts of the flags in question related to this issue? As mentioned, since this is running on a device which we do not test for or with, could you provide more debug information on what is happening?

     

    Kind regards,

    Håkon

Related