FOTA update: at_cmd calls recv before download_client has a chance to call recv?

I am working on firmware for a custom board with an nRF9160. Specifically, I'm trying to get FOTA updates working, using the provided library functions for it. I had no issues running the sample FOTA update application on an nRF9160DK. I'm using essentially the same procedure as the sample on my custom board's program: submitting a work item when a signal is received, and calling fota_download_start when the work callback is received.

It seems that, after download_client_start is called, the at_cmd thread runs before the download_client thread, as it has higher priority. The at_cmd thread calls recv, getting the first fragment. Then when the download_client thread is given a chance to call recv, there's no more data to receive and the function doesn't return until timing out. I am not sure if this is the intended sequence of events, or something is going wrong. Does it seem like this would be a configuration issue?

The headers and first few bytes of the file get printed through the notification handler. Using the debugger to inspect the AT cmd buffer, I can see that the http headers and fragment are present (printing stops on the first zero byte of course). This is what I was seeing logged in the terminal earlier:

[00:08:28.451,416] <inf> download_client: Configuring socket timeout (120 s)
[00:08:28.455,535] <inf> download_client: Connecting to <our website>
[00:11:26.418,762] <inf> download_client: Downloading: firmware?version=0.0.1 [0]
[00:11:29.493,225] <dbg> at_monitor.at_monitor_task: AT notif: +CSCON: 0

[00:11:29.497,222] <dbg> at_monitor.at_monitor_task: Dispatching to 0x2edb5
[00:11:29.501,403] <dbg> at_monitor.at_monitor_task: AT notif: +CSCON: 1

[00:11:29.505,371] <dbg> at_monitor.at_monitor_task: Dispatching to 0x2edb5
[00:11:29.509,582] <dbg> at_monitor.at_monitor_task: AT notif: +CSCON: 0

[00:11:29.513,549] <dbg> at_monitor.at_monitor_task: Dispatching to 0x2edb5
HTTP/1.1 206 Partial Content
Content-Length: 1024
Content-Type: application/octet-stream4
Content-Range: bytes 0-1023/228839
Accept-Ranges: bytes
Server: Microsoft-IIS/10.0
Content-Disposition: attachment; filename=app_update.bin; filename*=UTF-8''app_update.bin
X-Powered-By: ASP.NET
Date: Thu, 16 Jun 2022 17:09:51 GMT

=¸ó[00:11:57.741,607] <err> at_cmd: AT message empty
[00:12:02.742,248] <err> at_cmd: AT socket recv failed with err 104

Now, I'm getting this error instead of the AT cmd ones:

[00:03:43.071,319] <err> download_client: Error in recv(), errno 116

  • Hello,

    I will have to get back to you on this.

    Regards,

    Elfving

  • Hello again,

    I had no issues running the sample FOTA update application on an nRF9160DK. I'm using essentially the same procedure as the sample on my custom board's program

    First of all, did you have any issues running the default sample on your custom board?

    Regards,

    Elfving

  • Hi Elfving,

    I made a project to compile the default sample for the custom board. I flashed it, and was able to download a firmware image that I'd previously generated from an aws s3 bucket, and it updated the firmware successfully.

    After that, I changed CONFIG_DOWNLOAD_HOST and CONFIG_DOWNLOAD_FILE in prj.conf to download the same file from our server instead of the aws one. This also succeeded, and the firmware was updated on reset.

    It seems the issue lies somewhere in my project. In case it's useful, here's what I have in prj.conf:

    CONFIG_HEAP_MEM_POOL_SIZE=16384
    CONFIG_MAIN_STACK_SIZE=8192

    CONFIG_NRF_MODEM_LIB=y

    CONFIG_AT_HOST_LIBRARY=y

    CONFIG_UART_INTERRUPT_DRIVEN=y
    CONFIG_UART_ASYNC_API=y

    CONFIG_NETWORKING=y
    CONFIG_NET_NATIVE=n
    CONFIG_NET_SOCKETS=y
    CONFIG_NET_SOCKETS_OFFLOAD=y
    CONFIG_NET_SOCKETS_POSIX_NAMES=y

    CONFIG_MODEM_INFO=y
    CONFIG_MODEM_KEY_MGMT=y

    CONFIG_LTE_LINK_CONTROL=y
    CONFIG_LTE_NETWORK_MODE_LTE_M=y
    CONFIG_LTE_AUTO_INIT_AND_CONNECT=n

    CONFIG_BOOTLOADER_MCUBOOT=y

    CONFIG_NEWLIB_LIBC=y

    CONFIG_REBOOT=y

    CONFIG_NRFX_TIMER=y
    CONFIG_NRFX_TIMER0=y
    CONFIG_NRFX_TIMER1=y
    CONFIG_NRFX_TIMER2=y

    CONFIG_ADC=y

    CONFIG_I2C=y

    CONFIG_SPI=y

    CONFIG_IMG_MANAGER=y
    CONFIG_FLASH=y
    CONFIG_IMG_ERASE_PROGRESSIVELY=y

    CONFIG_FOTA_DOWNLOAD=y

    CONFIG_DOWNLOAD_CLIENT=y
    CONFIG_DOWNLOAD_CLIENT_BUF_SIZE=2048
    CONFIG_DOWNLOAD_CLIENT_STACK_SIZE=4096
    CONFIG_DOWNLOAD_CLIENT_MAX_FILENAME_SIZE=210
    CONFIG_DOWNLOAD_CLIENT_HTTP_FRAG_SIZE_1024=y
    CONFIG_DOWNLOAD_CLIENT_TCP_SOCK_TIMEO_MS=120000
    CONFIG_DOWNLOAD_CLIENT_RANGE_REQUESTS=y

    CONFIG_DFU_TARGET=y

    CONFIG_BOOTLOADER_MCUBOOT=y

    CONFIG_SERIAL=y

    CONFIG_PM=y
    CONFIG_PM_DEVICE=y

    CONFIG_LTE_NETWORK_TIMEOUT=300

    CONFIG_DEBUG_OPTIMIZATIONS=y
    CONFIG_DEBUG_THREAD_INFO=y

    CONFIG_LOG=y
    CONFIG_LOG_DEFAULT_LEVEL=4
    CONFIG_LOG_MODE_IMMEDIATE=y
    CONFIG_DEBUG_COREDUMP=y
    CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y

    Please let me know if there's anything else that would be helpful for me to provide.

    Thank you for helping,

    Brad

  • Sorry about the delay.

    Error 104 is Connection reset by peer, and Error 116 is Connection timed out. I'm not immediately seeing an issue with your configurations here.

    I'm using essentially the same procedure as the sample on my custom board's program: submitting a work item when a signal is received, and calling fota_download_start when the work callback is received.

    So you are trying to essentially do the same as the sample, and the sample works. Then I guess the issue is with the code, and to find it we will have to follow the sample more closely. Did you use the sample as a template to expand on? Are there many differences between the sample and your aws code?

    Regards,

    Elfving

  • Hi Elfving,

    No worries about the delay. In the meantime, I've been able to get OTA updates working using a function I made, which borrows a lot of code from the FOTA and download client libraries in NCS 1.8.0. It runs recv immediately after send is called, circumventing the issue I was seeing before. I'm using the MCUboot DFU API with no issues.

    Below has more details, but I got the FOTA library working in my project today. While my method worked, I wanted to see how well the library code can be implemented in my project.

    Did you use the sample as a template to expand on? 

    No, the main project used the https client sample as a base to expand on. May be worth mentioning that I'm getting the file with plain http for the time being.

    Are there many differences between the sample and your aws code?

    There are some differences between the sample and my code:

    • The update will be triggered by receiving a certain message from the server, instead of a button press. For testing the sample, I edited my device tree file so an input pin is aliased to sw0, like button 1 on the nrf9160dk.
    • Due to how I intend to trigger the update, the button_init call from update_sample_init is not used. Similarly, the board has no LEDs, so the led_init call is also unused in my app.
    • nrf_modem_lib_init(NORMAL_MODE) is called at the start of main, like in the sample. However, unlike the sample, boot_write_img_confirmed, fota_download_init, and k_work_init are called when the update sequence is triggered.
      • Moving the init calls to happen on startup, before nrf_modem_lib_init, made no difference.*
      • Right after these init calls, k_work_submit is called.
    • I'm not using the modem_configure call from update_sample_init, as I want to handle LTE connection outside of the update sequence. The code's written to ensure the device is connected to LTE before it starts trying to establish connection to the server and getting any file fragments.

    The fota_work_cb function is still used to call update_start, which just calls fota_download_start and reports if there was an error.

    * In coming back and comparing my code to the sample again, I noticed that the init calls should happen after nrf_modem_lib_init instead of before, in order to be consistent with the sample. After changing that, the download_client is now able to get the file and the FOTA library works properly now!

    It's running a lot slower than both the sample and my version, probably due to other things going on in . I will starting looking into options for this.

    Thank you for your time and assistance,

    Brad

Related