This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

FOTA Hangs

I’m using the latest version of the modem 1.2.0 and sdk v1.2.0 

For the most part AWS FOTA seems to be functioning.

However it will start the FOTA job and hang and fail to proceed until the device is rebooted, after which the originally initiated FOTA job will start and finish as expected. 

The following log shows what is happening and where the reboot is.

I'm unsure if this issue is related to the TLS cache  problem referred to in many other posts whereby a second attempt at communicating over a socket will fail.

Just eneed some guidance and help if possible please.

[00:05:02.673,980] [1B][0m<dbg> aws_fota.on_publish_evt: Received topic: $aws/things/ogw-352656100322588/jobs/notify-next[1B][0m

[00:05:02.684,936] [1B][0m<inf> aws_fota: Checking for an available job[1B][0m

[00:05:02.691,955] [1B][0m<dbg> aws_fota.get_job_execution: Job ID: 795[1B][0m

[00:05:02.698,516] [1B][0m<dbg> aws_fota.get_job_execution: hostname: ozark-iot-fota.s3-ap-southeast-2.amazonaws.com[1B][0m

[00:05:02.709,045] [1B][0m<dbg> aws_fota.get_job_execution: file_path app_update.bin[1B][0m

[00:05:02.716,735] [1B][0m<dbg> aws_fota.get_job_execution: execution_version_number: 1 [1B][0m

[00:05:02.724,853] [1B][0m<inf> aws_jobs: Subscribe: $aws/things/ogw-352656100322588/jobs/795/update/#[1B][0m

[00:05:03.989,715] [1B][0m<dbg> aws_fota.on_publish_evt: Received topic: $aws/things/ogw-352656100322588/jobs/notify-next[1B][0m

[00:05:04.000,671] [1B][0m<inf> aws_fota: Checking for an available job[1B][0m

[00:05:04.007,720] [1B][0m<dbg> aws_fota.get_job_execution: Job ID: 795[1B][0m

[00:05:04.014,282] [1B][0m<dbg> aws_fota.get_job_execution: hostname: ozark-iot-fota.s3-ap-southeast-2.amazonaws.com[1B][0m

[00:05:04.024,780] [1B][0m<dbg> aws_fota.get_job_execution: file_path app_update.bin[1B][0m

[00:05:04.032,501] [1B][0m<dbg> aws_fota.get_job_execution: execution_version_number: 1 [1B][0m

[00:05:04.040,618] [1B][0m<inf> aws_jobs: Subscribe: $aws/things/ogw-352656100322588/jobs/795/update/#[1B][0m

[00:05:04.483,184] [1B][0m<dbg> aws_fota.on_publish_evt: Received topic: $aws/things/ogw-352656100322588/jobs/notify-next[1B][0m

[00:05:04.494,140] [1B][0m<inf> aws_fota: Checking for an available job[1B][0m

[00:05:04.501,190] [1B][0m<dbg> aws_fota.get_job_execution: Job ID: 795[1B][0m

[00:05:04.507,751] [1B][0m<dbg> aws_fota.get_job_execution: hostname: ozark-iot-fota.s3-ap-southeast-2.amazonaws.com[1B][0m

[00:05:04.518,249] [1B][0m<dbg> aws_fota.get_job_execution: file_path app_update.bin[1B][0m

[00:05:04.525,939] [1B][0m<dbg> aws_fota.get_job_execution: execution_version_number: 1 [1B][0m

[00:05:04.534,088] [1B][0m<inf> aws_jobs: Subscribe: $aws/things/ogw-352656100322588/jobs/795/update/#[1B][0m

[00:05:04.957,000] [1B][0m<dbg> aws_fota.update_job_execution: update_job_execution, state: 1, status: 1, version_number: 1[1B][0m

[00:05:04.968,322] [1B][0m<inf> aws_jobs: Publish topic: $aws/things/ogw-352656100322588/jobs/795/update[1B][0m

[00:05:04.977,783] [1B][0m<inf> aws_jobs: Publish payload {"status":"IN_PROGRESS","statusDetails": {"nextState":"download_firmware","progress":0},"expectedVersion": "1","clientToken": ""}[1B][0m

 

 

Device powered off and rebooted here……,

 

[00]*** Booting Zephyr OS build v2.1.99-ncs1  ***

[00:00:00.004,211] [1B][0m<inf> mcuboot: Starting bootloader[1B][0m

[00:00:00.012,329] [1B][0m<inf> mcuboot: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3[1B][0m

[00:00:00.025,390] [1B][0m<inf> mcuboot: Boot source: none[1B][0m

[00:00:00.032,562] [1B][0m<inf> mcuboot: Swap type: revert[1B][0m

[00:00:00.432,189] [1B][0m<inf> mcuboot: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3[1B][0m

[00:00:22.045,471] [1B][0m<inf> mcuboot: Bootloader chainload address offset: 0xc000[1B][0m

[00:00:22.053,192] [1B][0m<inf> mcuboot: Jumping to the first image slot[1B][0m

*** Booting Zephyr OS build v2.1.99-ncs1  ***

Flash region                        Domain                 Permissions

00 0x00000 0x08000      Secure                  rwxl

01 0x08000 0x10000      Secure                  rwxl

02 0x10000 0x18000      Secure                  rwxl

03 0x18000 0x20000      Non-Secure         rwxl

04 0x20000 0x28000      Non-Secure         rwxl

05 0x28000 0x30000      Non-Secure         rwxl

06 0x30000 0x38000      Non-Secure         rwxl

07 0x38000 0x40000      Non-Secure         rwxl

08 0x40000 0x48000      Non-Secure         rwxl

09 0x48000 0x50000      Non-Secure         rwxl

10 0x50000 0x58000      Non-Secure         rwxl

11 0x58000 0x60000      Non-Secure         rwxl

12 0x60000 0x68000      Non-Secure         rwxl

13 0x68000 0x70000      Non-Secure         rwxl

14 0x70000 0x78000      Non-Secure         rwxl

15 0x78000 0x80000      Non-Secure         rwxl

16 0x80000 0x88000      Non-Secure         rwxl

17 0x88000 0x90000      Non-Secure         rwxl

18 0x90000 0x98000      Non-Secure         rwxl

19 0x98000 0xa0000      Non-Secure         rwxl

20 0xa0000 0xa8000      Non-Secure         rwxl

21 0xa8000 0xb0000      Non-Secure         rwxl

22 0xb0000 0xb8000      Non-Secure         rwxl

23 0xb8000 0xc0000       Non-Secure         rwxl

24 0xc0000 0xc8000       Non-Secure         rwxl

25 0xc8000 0xd0000       Non-Secure         rwxl

26 0xd0000 0xd8000      Non-Secure         rwxl

27 0xd8000 0xe0000      Non-Secure         rwxl

28 0xe0000 0xe8000      Non-Secure         rwxl

29 0xe8000 0xf0000        Non-Secure         rwxl

30 0xf0000 0xf8000         Non-Secure         rwxl

31 0xf8000 0x100000     Non-Secure         rwxl

Non-secure callable region 0 placed in flash region 2 with size 32.

 

 

SRAM region                      Domain                 Permissions

00 0x00000 0x02000       Secure                  rwxl

01 0x02000 0x04000       Secure                  rwxl

02 0x04000 0x06000       Secure                  rwxl

03 0x06000 0x08000       Secure                  rwxl

04 0x08000 0x0a000       Secure                  rwxl

05 0x0a000 0x0c000       Secure                  rwxl

06 0x0c000 0x0e000        Secure                  rwxl

07 0x0e000 0x10000       Secure                  rwxl

08 0x10000 0x12000       Non-Secure         rwxl

09 0x12000 0x14000       Non-Secure         rwxl

10 0x14000 0x16000       Non-Secure         rwxl

11 0x16000 0x18000       Non-Secure         rwxl

12 0x18000 0x1a000       Non-Secure         rwxl

13 0x1a000 0x1c000       Non-Secure         rwxl

14 0x1c000 0x1e000        Non-Secure         rwxl

15 0x1e000 0x20000       Non-Secure         rwxl

16 0x20000 0x22000       Non-Secure         rwxl

17 0x22000 0x24000       Non-Secure         rwxl

18 0x24000 0x26000       Non-Secure         rwxl

19 0x26000 0x28000       Non-Secure         rwxl

20 0x28000 0x2a000       Non-Secure         rwxl

21 0x2a000 0x2c000       Non-Secure         rwxl

22 0x2c000 0x2e000        Non-Secure         rwxl

23 0x2e000 0x30000       Non-Secure         rwxl

24 0x30000 0x32000       Non-Secure         rwxl

25 0x32000 0x34000       Non-Secure         rwxl

26 0x34000 0x36000       Non-Secure         rwxl

27 0x36000 0x38000       Non-Secure         rwxl

28 0x38000 0x3a000       Non-Secure         rwxl

29 0x3a000 0x3c000       Non-Secure         rwxl

30 0x3c000 0x3e000        Non-Secure         rwxl

31 0x3e000 0x40000       Non-Secure         rwxl

 

Peripheral                             Domain                 Status

00 NRF_P0               Non-Secure                 OK

01 NRF_CLOCK            Non-Secure          OK

02 NRF_RTC0             Non-Secure             OK

03 NRF_RTC1             Non-Secure             OK

04 NRF_NVMC             Non-Secure           OK

05 NRF_UARTE1           Non-Secure         OK

06 NRF_UARTE2           Non-Secure         OK

07 NRF_TWIM2            Non-Secure           OK

08 NRF_SPIM3            Non-Secure            OK

09 NRF_TIMER0           Non-Secure          OK

10 NRF_TIMER1           Non-Secure          OK

11 NRF_TIMER2           Non-Secure          OK

12 NRF_SAADC            Non-Secure          OK

13 NRF_PWM0             Non-Secure           OK

14 NRF_PWM1             Non-Secure           OK

15 NRF_PWM2             Non-Secure           OK

16 NRF_PWM3             Non-Secure           OK

17 NRF_WDT              Non-Secure             OK

18 NRF_IPC              Non-Secure                OK

19 NRF_VMC              Non-Secure             OK

20 NRF_FPU              Non-Secure              OK

21 NRF_EGU1             Non-Secure            OK

22 NRF_EGU2             Non-Secure            OK

23 NRF_DPPIC            Non-Secure            OK

24 NRF_GPIOTE1          Non-Secure         OK

25 NRF_REGULATORS       Non-Secure                 OK

 

SPM: NS image at 0x18200

SPM: NS MSP at 0x2002e520

SPM: NS reset vector at 0x1e1cd

SPM: prepare to jump to Non-Secure image.

*** Booting Zephyr OS build v2.1.99-ncs1  ***

Application started:9004

GPS:

client_id = ogw-352656100322588

nRF Cloud library initialized. 0

Establishing LTE link (this may take some time) ...

[00:00:01.764,984] [1B][0m<inf> lte_lc: PDP Context: AT+CGDCONT=0,"IP","m2m"[1B][0m

[00:00:01.772,033] [1B][0m<dbg> lte_lc.w_lte_lc_connect: Network mode: AT%XSYSTEMMODE=1,0,1,0[1B][0m

+CEREG: 2,"46C9","0006906F",7,0,0,"11100000","11100000"

[00:00:02.799,194] [1B][0m<dbg> lte_lc.parse_nw_reg_status: Network registration status: 2[1B][0m

+CEREG: 1,"46C9","0006906F",7,,,"00000001","10000100"

[00:00:04.981,964] [1B][0m<dbg> lte_lc.parse_nw_reg_status: Network registration status: 1[1B][0m

Connected to CAT M1 now:, err code: 0

Initialising GPS...

GPS initialized

AWS connection (connected: 0, in progress: 0)

Not connected to AWS, trying to connect

[00:00:09.224,517] [1B][0m<inf> aws_jobs: Subscribe: $aws/things/ogw-352656100322588/jobs/notify-next[1B][0m

[00:00:09.235,046] [1B][0m<inf> aws_jobs: Subscribe: $aws/things/ogw-352656100322588/jobs/$next/get/#[1B][0m

OZARK_CLOUD_EVT_TRANSPORT_CONNECTED

PSM enabled

[00:00:09.414,459] [1B][0m<inf> aws_fota: subscribed to notify-next topic[1B][0m

[00:00:09.421,264] [1B][0m<inf> aws_jobs: Publish topic: $aws/things/ogw-352656100322588/jobs/$next/get[1B][0m

[00:00:09.430,664] [1B][0m<inf> aws_jobs: Publish payload {"clientToken": ""}[1B][0m

[00:00:09.651,367] [1B][0m<dbg> aws_fota.on_publish_evt: Received topic: $aws/things/ogw-352656100322588/jobs/$next/get/accepted[1B][0m

[00:00:09.662,933] [1B][0m<inf> aws_fota: Checking for an available job[1B][0m

[00:00:09.670,043] [1B][0m<dbg> aws_fota.get_job_execution: Job ID: 795[1B][0m

[00:00:09.676,605] [1B][0m<dbg> aws_fota.get_job_execution: hostname: ozark-iot-fota.s3-ap-southeast-2.amazonaws.com[1B][0m

[00:00:09.687,133] [1B][0m<dbg> aws_fota.get_job_execution: file_path app_update.bin[1B][0m

[00:00:09.694,824] [1B][0m<dbg> aws_fota.get_job_execution: execution_version_number: 2 [1B][0m

[00:00:09.703,033] [1B][0m<inf> aws_jobs: Subscribe: $aws/things/ogw-352656100322588/jobs/795/update/#[1B][0m

[00:00:10.195,739] [1B][0m<dbg> aws_fota.update_job_execution: update_job_execution, state: 1, status: 1, version_number: 2[1B][0m

[00:00:10.207,061] [1B][0m<inf> aws_jobs: Publish topic: $aws/things/ogw-352656100322588/jobs/795/update[1B][0m

  • Yes, that makes perfect sense and matches what I'm also observing here. The FOTA download only works simultaneously with GPS searching mode if the download started before the device begins to search for satellites.

    It's just weird that, occasionally, when the device stops searching for satellites and receives the update request from the socket. that it hangs before starting to download the file and while "Enabling PSM". Can't tell if its a problem with the asset_tracker example, an SDK library or modem firmware / carrier issue.

  • Would you mind checking something for me please?

    Change CONFIG_MQTT_KEEPALIVE=60

    Perhaps through prj.conf

    GPS won't work probably, but I'm very interested to see if you notice that FOTA works.

  • I have changed CONFIG_MQTT_KEEPALIVE=60 and now the FOTA download request is received earlier when the system is searching for satellites. The FOTA still hangs here, ------STRIKETHROUGH----- but it does not show "Enabling PSM" as before. Can't tell what's going on here, this hang seems to be different than before.------STRIKETHROUGH-----   After receiving the FOTA download request, there was a long pause that seemed like an hang, but then the console printed (and then it really hang):

    [00:06:17.860,290] <dbg> nrf9160_gps.stop: Stopping GPS
    [00:06:17.866,363] <inf> gps_control: GPS operation was stopped
    [00:06:17.872,833] <inf> gps_control: The device will try to get fix again in 30 seconds
    [00:06:47.881,530] <inf> gps_control: Enabling PSM

    Before changing the keepalive, I have experimented with enabling the Watchdog in prj.conf as it was not enabled by default. After "Enabling PSM", the WD rebooted the system so it is definitely hanging when trying to enable PSM after GPS runs for a while and a FOTA update request is received as soon as the GPS searching stops.

    I also wonder if the position of switch 2 of the DK has any influence over this as it messes with EDRX.

    EDIT: Never mind my previous sentence, as I have CONFIG_POWER_OPTIMIZATION_ENABLE=n which disables switch 2.

  • I can now confirm that if I program a DK with the latest Asset Tracker (Built from nRF Connect SDK master branch commit bc7ade8b) and link it to NRF Cloud and do a FOTA update.

    The same thing occurs whereby the device will "notice" that there is an update and begin to perform the update and then hang.

    If I preset reset on the DK it will reboot and perform the update correctly.

  • Hi Simon, no stress, but very keen for a hand with this issue.

Related