This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160 AWS FOTA fails to load binary

We have worked through getting AWS MQTT working and have the FOTA job file loading properly. When the HTTP request is made the file begins downloading but fails to download after a seemingly random period of time. Anywhere between 3-30% complete. The failure happens when the receiving bytes step takes a considerable amount of time and then the next http request fails. The FOTA library then resets and tries to download the file again with the same result.

Parents
  • We are using the latest tag, as well as trying master in a few other locations.

  • Using the latest tag (v1.0.0) even the samples don't seem to build. Java script errors about files not existing as well as python dependencies referencing projects that no longer exist. Hacking around them seems to only cause more issues.

    On master things actually build but the aws FOTA sample on its own still fails in the same way as the log above.

    From a surface level it looks like the download thread stops trying to receive after an indeterminant amount of time or perhaps suddenly we are hitting some sort of timeout on a socket. Looking through some documentation it looks like the download_client_start should be able to start a connection back up from where it was but this doesn't seem to be used in the examples, completely restarting the download on this error.  Also it is unclear what the error 128 is as it isn't enumerated in the errno.

Reply
  • Using the latest tag (v1.0.0) even the samples don't seem to build. Java script errors about files not existing as well as python dependencies referencing projects that no longer exist. Hacking around them seems to only cause more issues.

    On master things actually build but the aws FOTA sample on its own still fails in the same way as the log above.

    From a surface level it looks like the download thread stops trying to receive after an indeterminant amount of time or perhaps suddenly we are hitting some sort of timeout on a socket. Looking through some documentation it looks like the download_client_start should be able to start a connection back up from where it was but this doesn't seem to be used in the examples, completely restarting the download on this error.  Also it is unclear what the error 128 is as it isn't enumerated in the errno.

Children
  • With some more searching I have found that recv() is taking a long time or the download thread is getting swapped out while recv is running. After getting swapped back in or when the recv finishes (it finishes successfully) we are no longer connected so when we try to send another request we fail due to it being closed. I have found that catching this at the app level, setting the client socket connection to -1 and then attempting to reconnect will get me a bit further, however then the peer connection is resetting. Just some more information. I think it would be a bit more intuitive if when attempting to send on a socket that errs 128 (no socket connection) that it would try to reconnect itself in the OS instead of putting that in application space.

  • I'm experiencing the exact same issue myself, plus another one - both of these are blocking us from deploying our system as it currently stands, as our devices will be installed in inaccessible places for reflashing.

    My log file from the 128 error from that socket_send is below, I'm only getting 4096 bytes into the sample's firmware download stored on AWS. I haven't yet had a full end-end download succeed. I eliminated thread switching as the issue by moving the contents of download_thread directly into the application thread's download_client_start. Furthermore I removed fragment_evt_send() so there wasn't any waiting for the flashing to occur. This means the downloading of fragments was running at full speed, again it all ceased after receiving 4096 bytes. Something is killing the connection at this threshold.

    Upload root s3.amazonaws.com/nordic-firmware-files/c1838e44-9d6f-4f41-990e-70b4e60770ce
    File part nordic-firmware-files/c1838e44-9d6f-4f41-990e-70b4e60770ce
    [00:01:51.243,560] [0m<inf> download_client: Attempting to connect over IPv4[0m
    [00:01:51.707,397] [0m<inf> download_client: Connected[0m
    [00:01:51.727,600] [0m<inf> download_client: Downloading: nordic-firmware-files/c1838e44-9d6f-4f41-990e-70b4e60770ce [0][0m
    
    [00:01:51.818,450] [0m<dbg> download_client.get_request_send: Sending HTTP request[0m
    [00:01:51.849,365] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:52.633,605] [0m<dbg> download_client.download_thread: Read 421 bytes from socket[0m
    [00:01:52.664,215] [0m<dbg> download_client.header_parse: GET header size: 421[0m
    [00:01:52.694,732] [0m<dbg> download_client.header_parse: File size = 148160[0m
    [00:01:52.725,341] [0m<inf> download_client: Downloaded bytes: 0/148160[0m
    [00:01:52.750,579] [0m<dbg> download_client.download_thread: Awaiting full fragment (0)[0m
    [00:01:52.780,975] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:52.811,279] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:01:52.841,552] [0m<inf> download_client: Downloaded bytes: 1024/148160[0m
    [00:01:52.872,009] [0m<dbg> download_client.download_thread: Awaiting full fragment (1024)[0m
    [00:01:52.907,318] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:52.937,774] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:01:52.968,048] [0m<inf> download_client: Downloaded bytes: 2048/148160[0m
    [00:01:52.998,352] [0m<inf> fota_flash_block: Erasing sector at offset 0x0[0m
    
    [00:01:53.186,004] [0m<dbg> download_client.get_request_send: Sending HTTP request[0m
    [00:01:53.216,766] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:53.884,582] [0m<dbg> download_client.download_thread: Read 424 bytes from socket[0m
    [00:01:53.915,344] [0m<dbg> download_client.header_parse: GET header size: 424[0m
    [00:01:53.945,953] [0m<inf> download_client: Downloaded bytes: 2048/148160[0m
    [00:01:53.976,226] [0m<dbg> download_client.download_thread: Awaiting full fragment (0)[0m
    [00:01:54.006,500] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:54.340,332] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:01:54.370,605] [0m<inf> download_client: Downloaded bytes: 3072/148160[0m
    [00:01:54.400,878] [0m<dbg> download_client.download_thread: Awaiting full fragment (1024)[0m
    [00:01:54.436,309] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:02:17.091,583] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:02:17.122,100] [0m<inf> download_client: Downloaded bytes: 4096/148160[0m
    
    [00:02:17.229,522] [0m<dbg> download_client.get_request_send: Sending HTTP request[0m
    [00:02:17.259,796] [1;31m<err> download_client: Failed to send HTTP request, errno 128[0m
    [00:02:17.290,100] [1;31m<err> fota_download: Download client error[0m
    Received error from fota_download

    I've also noticed another bug occasionally getting a *** buffer overflow detected *** error from the memcpy when (client->offset != hdr) in header_parse(). This terminates the entire process.

    [00:01:44.288,482] [0m<dbg> download_client.download_thread: Awaiting full fragment (1660)[0m
    [00:01:44.323,760] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:44.354,064] [0m<dbg> download_client.download_thread: Read 388 bytes from socket[0m
    [00:01:44.384,307] [0m<inf> download_client: Downloaded bytes: 4096/148160[0m
    
    [00:01:44.491,790] [0m<dbg> download_client.get_request_send: Sending HTTP request[0m
    [00:01:44.522,521] [0m<dbg> download_client.download_thread: Receiving bytes..[0m
    [00:01:45.347,503] [0m<dbg> download_client.download_thread: Read 1024 bytes from socket[0m
    [00:01:45.378,143] [0m<dbg> download_client.header_parse: GET header size: 424[0m
    Partial BUFSIZE:2048, OFFSET:1024, HDR:424
    [00:01:52.051,513] [1;33m<wrn> download_client: Copying 600 payload bytes[0m
    *** buffer overflow detected ***: terminated
    exit

Related