FOTA triggered with very long delay

Hi,

We are using a modified version of the multi-service sample for nRF9151. Generally, FOTA updates from the nRF Cloud work well.

Once we deploy the FOTA update in the nRF Cloud, if the device is online we see the device getting the FOTA notification and starting the update within about 1 to 5 minutes.

However, we have experienced an instance where there was a delay of several days until a device picked up the FOTA, despite running and being connected to the nRF Cloud. We tried re-creating/redeploying the update in the cloud and we power-cycled the device several times over the course of a few days. But in the device log we could see the device is not getting the FOTA notification and therefore does not start the download. The FOTA just stays as "QUEUED" but never changes to "DOWNLOADING".

After several days, we power-cycled the device again for a different reason, and now suddenly it picked up the OTA. We did not change anything on the device, so we think the issue lies with the nRF Cloud, e.g. the FOTA deployment suddenly came alive and triggered the device. 

The problem is very difficult to reproduce and we don't have logs. My question is: Is this a known issue that there sometimes can be a delay of many hours or even days between deploying an FOTA in the cloud and the FOTA actually being triggered on the device? Is there a recommendation to avoid this situation?

Thanks

  • Hi Dejan,

    We noticed the issue about 3 weeks ago -- the update to that device was listed as "QUEUED" for about 2 weeks. Then, a week ago, the device suddenly updated. As far as we know, nothing on our side changed, except that the device was power-cycled three or four times during the 2 weeks for different reasons. The last power cycle seemed to have triggered the OTA.

    The device was working as expected all the time, sending data to the cloud.

    At the core, our application is build on the multi-service example. The OTA job is created and deployed in the nRF Cloud UI. After deploying, the OTA usually is executed by a device within a minute. We have done about 100 OTA updates this way, and they worked fine. It's just this one device so far that took much longer than expected.

    I've attached two screenshots:

    1. The OTA job deployed on Jun 13

    2. The OTA job on Jul 7 it finally succeeded (note: We did another OTA the following day on Jul 8 as a test, that succeeded quickly, as expected). 

    Best,

    -- Terrence  

  • Hi,

    Do you use CoAP or MQTT?
    Can you test unmodified multi_service sample on your affected board to see if you can replicate the problem?
    Which result do you get when using AT%XMONITOR?

    Best regards,
    Dejan

  • Hi Dejan,

    We use MQTT. We have not used AT%XMONITOR yet.

    I can try the unmodified multi_service example, but it may be a while before I have any results because I have limited cycles and the problem can't be readily reproduced.

    Going back to my initial question, I was trying to understand if there may be a known issue with delayed FOTA. But it seems like you are not aware of any pending issues and the occurrence seems rare, so right now it's not a very high priority for us. 

    Thanks!

  • Hi,

    teba99 said:
    I can try the unmodified multi_service example, but it may be a while before I have any results because I have limited cycles and the problem can't be readily reproduced.

    You can try multi_service sample to check if the same issue is present with unmofied sample as in the sample with your modifications. Please let me know if you are able to reproduce the issue with unmodified multi_service sample.

    Best regards,
    Dejan

Related