http request fails if server takes at least 3 seconds to start responding

I'm attempting to HTTP GET from a google cloud function that does a lost of processing so it takes a long time to start responding. On the latest version of my custom board this has started to consistently fail for several hours at a time but then start working for several hours before failing again. I'm able to verify in the server logs that the cloud function thinks it is successfully returning the data. I believe there is a proxy, so maybe it is just successfully returning to the proxy and then the proxy notices a broken tcp connection. I do not have any proxy logs to back this up.

Calling the cloud function from both a different chip and from a browser is always successful. I've created a demo sever where I can control the timeout and it seems that as long as the response time is 2 seconds or below the HTTP GET succeeds.

The failure symptoms on the device is that the first call to zsock_recv() inside of Zephyr's http code never unblocks, even if I let it run overnight. This makes me think that the tcp connection is dropping and the modem does not notice. Zephyr's http timeout is implemented using zsock_shutdown() which I think is not implemented for the nrf9160 (but verification would be appreciated). I'll try using SO_RCVTIMEO to hopefully turn this infinite hang into a cleaner error but I still need to get this http request working.

I'm using a custom board and have verified that this issue exists with both of the following setups:

  • nrf connect sdk version 2.0.0 and modem firmware version 1.3.1.
  • nrf connect sdk version 2.3.0 and modem firmware version 1.3.3.

I've attached a minimal example that uses sdk version 2.3.0. The HTTP_DELAY near the top of main.cpp is adjusted to control the delay.  In my testing a value of 3 or higher always fails when the board is being sketchy.  A value of 0 or 1 always succeeds.  2 works most of the time.  I'm keeping the server deployed while this issue is open.

Creating this ticket with attachments is failing.  I'll try updating using comments.

Parents Reply Children
No Data
Related