nRF Cloud FOTA jobs with queued and failed devices

Hi,

I have a question about FOTA jobs with queued and failed devices.

Let's say I am managing a group of 10 cellular devices (nRF9151) through the nRF Cloud UI. I create a FOTA job for this group and deploy it:

- Six out of these devices succeed with the update ("SUCCEEDED")

- Two fail due to poor connectivity, this is expected "(FAILED")

- Two aren't online at the time and may not be online for days, this is expected ("QUEUED")

My question is as follows:

How can I easily retry FAILED devices when the job is pending other devices which are still QUEUED? 

My understanding is, when the FOTA job is finished and there are FAILED devices, then a retry job for these FAILED devices will be automatically created. Which is great.

However, since devices may be offline for arbitrary amounts of time, I don't know when the FOTA job will finish, if ever. So how to I retry already FAILED devices after some timeframe?

I tried cancelling the FOTA job, but now I have to manually create a new group of devices with QUEUED and FAILED devices. At scale, this is unworkable.

It seems to me that the FOTA job concept is missing a timeout. I.e. a FOTA job finishes at the latest when the timeout occurs, regardless if devices are still QUEUED, at which point a retry job is automatically created with all remaining QUEUED and FAILED devices. Then I can easily retry by running that job, at which point the FAILED devices will be retried.

Is that a correct assessment? What is the best practice here?

Thanks,

-- Terrence

Related