This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160 AWS FOTA fails to load binary

We have worked through getting AWS MQTT working and have the FOTA job file loading properly. When the HTTP request is made the file begins downloading but fails to download after a seemingly random period of time. Anywhere between 3-30% complete. The failure happens when the receiving bytes step takes a considerable amount of time and then the next http request fails. The FOTA library then resets and tries to download the file again with the same result.

  • With some more searching I have found that recv() is taking a long time or the download thread is getting swapped out while recv is running. After getting swapped back in or when the recv finishes (it finishes successfully) we are no longer connected so when we try to send another request we fail due to it being closed. I have found that catching this at the app level, setting the client socket connection to -1 and then attempting to reconnect will get me a bit further, however then the peer connection is resetting. Just some more information. I think it would be a bit more intuitive if when attempting to send on a socket that errs 128 (no socket connection) that it would try to reconnect itself in the OS instead of putting that in application space.

  • Update here. This does seem to work with a different cellular provider. However it would be nice if this could still handle dropped sockets for robustness/reliability reasons.

  • I'm experiencing the exact same issue myself, plus another one - both of these are blocking us from deploying our system as it currently stands, as our devices will be installed in inaccessible places for reflashing.

    My log file from the 128 error from that socket_send is below, I'm only getting 4096 bytes into the sample's firmware download stored on AWS. I haven't yet had a full end-end download succeed. I eliminated thread switching as the issue by moving the contents of download_thread directly into the application thread's download_client_start. Furthermore I removed fragment_evt_send() so there wasn't any waiting for the flashing to occur. This means the downloading of fragments was running at full speed, again it all ceased after receiving 4096 bytes. Something is killing the connection at this threshold.

    Upload root s3.amazonaws.com/nordic-firmware-files/c1838e44-9d6f-4f41-990e-70b4e60770ce
    File part nordic-firmware-files/c1838e44-9d6f-4f41-990e-70b4e60770ce
    [00:01:51.243,560] [0m<inf> download_client: Attempting to connect over IPv4[0m
    [00:01:51.707,397] [0m<inf> download_client: Connected[0m
    [00:01:51.727,600] [0m<inf> download_client: Downloading: nordic-firmware-files/c1838e44-9d6f-4f41-990e-70b4e60770ce [0][0m
    
    [00:01:51.818,450] [0m<dbg> download_client.get_request_send: Sending HTTP request[0m
    [00:01:51.849,365] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:52.633,605] [0m<dbg> download_client.download_thread: Read 421 bytes from socket[0m
    [00:01:52.664,215] [0m<dbg> download_client.header_parse: GET header size: 421[0m
    [00:01:52.694,732] [0m<dbg> download_client.header_parse: File size = 148160[0m
    [00:01:52.725,341] [0m<inf> download_client: Downloaded bytes: 0/148160[0m
    [00:01:52.750,579] [0m<dbg> download_client.download_thread: Awaiting full fragment (0)[0m
    [00:01:52.780,975] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:52.811,279] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:01:52.841,552] [0m<inf> download_client: Downloaded bytes: 1024/148160[0m
    [00:01:52.872,009] [0m<dbg> download_client.download_thread: Awaiting full fragment (1024)[0m
    [00:01:52.907,318] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:52.937,774] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:01:52.968,048] [0m<inf> download_client: Downloaded bytes: 2048/148160[0m
    [00:01:52.998,352] [0m<inf> fota_flash_block: Erasing sector at offset 0x0[0m
    
    [00:01:53.186,004] [0m<dbg> download_client.get_request_send: Sending HTTP request[0m
    [00:01:53.216,766] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:53.884,582] [0m<dbg> download_client.download_thread: Read 424 bytes from socket[0m
    [00:01:53.915,344] [0m<dbg> download_client.header_parse: GET header size: 424[0m
    [00:01:53.945,953] [0m<inf> download_client: Downloaded bytes: 2048/148160[0m
    [00:01:53.976,226] [0m<dbg> download_client.download_thread: Awaiting full fragment (0)[0m
    [00:01:54.006,500] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:54.340,332] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:01:54.370,605] [0m<inf> download_client: Downloaded bytes: 3072/148160[0m
    [00:01:54.400,878] [0m<dbg> download_client.download_thread: Awaiting full fragment (1024)[0m
    [00:01:54.436,309] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:02:17.091,583] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:02:17.122,100] [0m<inf> download_client: Downloaded bytes: 4096/148160[0m
    
    [00:02:17.229,522] [0m<dbg> download_client.get_request_send: Sending HTTP request[0m
    [00:02:17.259,796] [1;31m<err> download_client: Failed to send HTTP request, errno 128[0m
    [00:02:17.290,100] [1;31m<err> fota_download: Download client error[0m
    Received error from fota_download

    I've also noticed another bug occasionally getting a *** buffer overflow detected *** error from the memcpy when (client->offset != hdr) in header_parse(). This terminates the entire process.

    [00:01:44.288,482] [0m<dbg> download_client.download_thread: Awaiting full fragment (1660)[0m
    [00:01:44.323,760] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:44.354,064] [0m<dbg> download_client.download_thread: Read 388 bytes from socket[0m
    [00:01:44.384,307] [0m<inf> download_client: Downloaded bytes: 4096/148160[0m
    
    [00:01:44.491,790] [0m<dbg> download_client.get_request_send: Sending HTTP request[0m
    [00:01:44.522,521] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:45.347,503] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:01:45.378,143] [0m<dbg> download_client.header_parse: GET header size: 424[0m
    Partial BUFSIZE:2048, OFFSET:1024, HDR:424
    [00:01:52.051,513] [1;33m<wrn> download_client: Copying 600 payload bytes[0m
    *** buffer overflow detected ***: terminated
    exit

  • We replaced the HTTP OTA functionality with MQTT OTA functionality that is completely stable and resilient to network failures mid-update. It seems the HTTP update should be more like this but we'll use the MQTT functionality for now. Would still like to see resolution on this.

  • Can you run AT+CESQ and report back the results?

Related