FOTA update: at_cmd calls recv before download_client has a chance to call recv?

I am working on firmware for a custom board with an nRF9160. Specifically, I'm trying to get FOTA updates working, using the provided library functions for it. I had no issues running the sample FOTA update application on an nRF9160DK. I'm using essentially the same procedure as the sample on my custom board's program: submitting a work item when a signal is received, and calling fota_download_start when the work callback is received.

It seems that, after download_client_start is called, the at_cmd thread runs before the download_client thread, as it has higher priority. The at_cmd thread calls recv, getting the first fragment. Then when the download_client thread is given a chance to call recv, there's no more data to receive and the function doesn't return until timing out. I am not sure if this is the intended sequence of events, or something is going wrong. Does it seem like this would be a configuration issue?

The headers and first few bytes of the file get printed through the notification handler. Using the debugger to inspect the AT cmd buffer, I can see that the http headers and fragment are present (printing stops on the first zero byte of course). This is what I was seeing logged in the terminal earlier:

[00:08:28.451,416] <inf> download_client: Configuring socket timeout (120 s)
[00:08:28.455,535] <inf> download_client: Connecting to <our website>
[00:11:26.418,762] <inf> download_client: Downloading: firmware?version=0.0.1 [0]
[00:11:29.493,225] <dbg> at_monitor.at_monitor_task: AT notif: +CSCON: 0

[00:11:29.497,222] <dbg> at_monitor.at_monitor_task: Dispatching to 0x2edb5
[00:11:29.501,403] <dbg> at_monitor.at_monitor_task: AT notif: +CSCON: 1

[00:11:29.505,371] <dbg> at_monitor.at_monitor_task: Dispatching to 0x2edb5
[00:11:29.509,582] <dbg> at_monitor.at_monitor_task: AT notif: +CSCON: 0

[00:11:29.513,549] <dbg> at_monitor.at_monitor_task: Dispatching to 0x2edb5
HTTP/1.1 206 Partial Content
Content-Length: 1024
Content-Type: application/octet-stream4
Content-Range: bytes 0-1023/228839
Accept-Ranges: bytes
Server: Microsoft-IIS/10.0
Content-Disposition: attachment; filename=app_update.bin; filename*=UTF-8''app_update.bin
X-Powered-By: ASP.NET
Date: Thu, 16 Jun 2022 17:09:51 GMT

=¸ó[00:11:57.741,607] <err> at_cmd: AT message empty
[00:12:02.742,248] <err> at_cmd: AT socket recv failed with err 104

Now, I'm getting this error instead of the AT cmd ones:

[00:03:43.071,319] <err> download_client: Error in recv(), errno 116

Parents
  • Sorry about the delay.

    Error 104 is Connection reset by peer, and Error 116 is Connection timed out. I'm not immediately seeing an issue with your configurations here.

    I'm using essentially the same procedure as the sample on my custom board's program: submitting a work item when a signal is received, and calling fota_download_start when the work callback is received.

    So you are trying to essentially do the same as the sample, and the sample works. Then I guess the issue is with the code, and to find it we will have to follow the sample more closely. Did you use the sample as a template to expand on? Are there many differences between the sample and your aws code?

    Regards,

    Elfving

  • Hi Elfving,

    No worries about the delay. In the meantime, I've been able to get OTA updates working using a function I made, which borrows a lot of code from the FOTA and download client libraries in NCS 1.8.0. It runs recv immediately after send is called, circumventing the issue I was seeing before. I'm using the MCUboot DFU API with no issues.

    Below has more details, but I got the FOTA library working in my project today. While my method worked, I wanted to see how well the library code can be implemented in my project.

    Did you use the sample as a template to expand on? 

    No, the main project used the https client sample as a base to expand on. May be worth mentioning that I'm getting the file with plain http for the time being.

    Are there many differences between the sample and your aws code?

    There are some differences between the sample and my code:

    • The update will be triggered by receiving a certain message from the server, instead of a button press. For testing the sample, I edited my device tree file so an input pin is aliased to sw0, like button 1 on the nrf9160dk.
    • Due to how I intend to trigger the update, the button_init call from update_sample_init is not used. Similarly, the board has no LEDs, so the led_init call is also unused in my app.
    • nrf_modem_lib_init(NORMAL_MODE) is called at the start of main, like in the sample. However, unlike the sample, boot_write_img_confirmed, fota_download_init, and k_work_init are called when the update sequence is triggered.
      • Moving the init calls to happen on startup, before nrf_modem_lib_init, made no difference.*
      • Right after these init calls, k_work_submit is called.
    • I'm not using the modem_configure call from update_sample_init, as I want to handle LTE connection outside of the update sequence. The code's written to ensure the device is connected to LTE before it starts trying to establish connection to the server and getting any file fragments.

    The fota_work_cb function is still used to call update_start, which just calls fota_download_start and reports if there was an error.

    * In coming back and comparing my code to the sample again, I noticed that the init calls should happen after nrf_modem_lib_init instead of before, in order to be consistent with the sample. After changing that, the download_client is now able to get the file and the FOTA library works properly now!

    It's running a lot slower than both the sample and my version, probably due to other things going on in . I will starting looking into options for this.

    Thank you for your time and assistance,

    Brad

Reply
  • Hi Elfving,

    No worries about the delay. In the meantime, I've been able to get OTA updates working using a function I made, which borrows a lot of code from the FOTA and download client libraries in NCS 1.8.0. It runs recv immediately after send is called, circumventing the issue I was seeing before. I'm using the MCUboot DFU API with no issues.

    Below has more details, but I got the FOTA library working in my project today. While my method worked, I wanted to see how well the library code can be implemented in my project.

    Did you use the sample as a template to expand on? 

    No, the main project used the https client sample as a base to expand on. May be worth mentioning that I'm getting the file with plain http for the time being.

    Are there many differences between the sample and your aws code?

    There are some differences between the sample and my code:

    • The update will be triggered by receiving a certain message from the server, instead of a button press. For testing the sample, I edited my device tree file so an input pin is aliased to sw0, like button 1 on the nrf9160dk.
    • Due to how I intend to trigger the update, the button_init call from update_sample_init is not used. Similarly, the board has no LEDs, so the led_init call is also unused in my app.
    • nrf_modem_lib_init(NORMAL_MODE) is called at the start of main, like in the sample. However, unlike the sample, boot_write_img_confirmed, fota_download_init, and k_work_init are called when the update sequence is triggered.
      • Moving the init calls to happen on startup, before nrf_modem_lib_init, made no difference.*
      • Right after these init calls, k_work_submit is called.
    • I'm not using the modem_configure call from update_sample_init, as I want to handle LTE connection outside of the update sequence. The code's written to ensure the device is connected to LTE before it starts trying to establish connection to the server and getting any file fragments.

    The fota_work_cb function is still used to call update_start, which just calls fota_download_start and reports if there was an error.

    * In coming back and comparing my code to the sample again, I noticed that the init calls should happen after nrf_modem_lib_init instead of before, in order to be consistent with the sample. After changing that, the download_client is now able to get the file and the FOTA library works properly now!

    It's running a lot slower than both the sample and my version, probably due to other things going on in . I will starting looking into options for this.

    Thank you for your time and assistance,

    Brad

Children
  • Great! Glad to hear it.

    brad_57 said:

    Thank you for your time and assistance,

    Hehe no problem, though I don't think I've helped you that much.

    When it comes to this speed difference, you could try to compare the logs of the two and see where things are delayed. 

    And I would also check if you are seeing the same thing on the DK to make sure that the HW doesn't have anything to do with it, though as long as you compare using the same board this is probably ok.

    Regards,

    Elfving

Related