I'm attempting to HTTP GET from a google cloud function that does a lost of processing so it takes a long time to start responding. On the latest version of my custom board this has started to consistently fail for several hours at a time but then start working for several hours before failing again. I'm able to verify in the server logs that the cloud function thinks it is successfully returning the data. I believe there is a proxy, so maybe it is just successfully returning to the proxy and then the proxy notices a broken tcp connection. I do not have any proxy logs to back this up.
Calling the cloud function from both a different chip and from a browser is always successful. I've created a demo sever where I can control the timeout and it seems that as long as the response time is 2 seconds or below the HTTP GET succeeds.
The failure symptoms on the device is that the first call to zsock_recv() inside of Zephyr's http code never unblocks, even if I let it run overnight. This makes me think that the tcp connection is dropping and the modem does not notice. Zephyr's http timeout is implemented using zsock_shutdown() which I think is not implemented for the nrf9160 (but verification would be appreciated). I'll try using SO_RCVTIMEO to hopefully turn this infinite hang into a cleaner error but I still need to get this http request working.
I'm using a custom board and have verified that this issue exists with both of the following setups:
- nrf connect sdk version 2.0.0 and modem firmware version 1.3.1.
- nrf connect sdk version 2.3.0 and modem firmware version 1.3.3.
I've attached a minimal example that uses sdk version 2.3.0. The HTTP_DELAY near the top of main.cpp is adjusted to control the delay. In my testing a value of 3 or higher always fails when the board is being sketchy. A value of 0 or 1 always succeeds. 2 works most of the time. I'm keeping the server deployed while this issue is open.
Creating this ticket with attachments is failing. I'll try updating using comments.