http request fails if server takes at least 3 seconds to start responding

I'm attempting to HTTP GET from a google cloud function that does a lost of processing so it takes a long time to start responding. On the latest version of my custom board this has started to consistently fail for several hours at a time but then start working for several hours before failing again. I'm able to verify in the server logs that the cloud function thinks it is successfully returning the data. I believe there is a proxy, so maybe it is just successfully returning to the proxy and then the proxy notices a broken tcp connection. I do not have any proxy logs to back this up.

Calling the cloud function from both a different chip and from a browser is always successful. I've created a demo sever where I can control the timeout and it seems that as long as the response time is 2 seconds or below the HTTP GET succeeds.

The failure symptoms on the device is that the first call to zsock_recv() inside of Zephyr's http code never unblocks, even if I let it run overnight. This makes me think that the tcp connection is dropping and the modem does not notice. Zephyr's http timeout is implemented using zsock_shutdown() which I think is not implemented for the nrf9160 (but verification would be appreciated). I'll try using SO_RCVTIMEO to hopefully turn this infinite hang into a cleaner error but I still need to get this http request working.

I'm using a custom board and have verified that this issue exists with both of the following setups:

  • nrf connect sdk version 2.0.0 and modem firmware version 1.3.1.
  • nrf connect sdk version 2.3.0 and modem firmware version 1.3.3.

I've attached a minimal example that uses sdk version 2.3.0. The HTTP_DELAY near the top of main.cpp is adjusted to control the delay.  In my testing a value of 3 or higher always fails when the board is being sketchy.  A value of 0 or 1 always succeeds.  2 works most of the time.  I'm keeping the server deployed while this issue is open.

Creating this ticket with attachments is failing.  I'll try updating using comments.

Related