Unexpected behaviour during TCP communication

Hey team,

We're having trouble with our device. It's been working fine, sending messages to our backend without a hitch. But after a while, it starts having problems connecting to the socket. We're seeing error 116.

<err> rest_client: Failed to connect socket, error: 116
<err> rest_client: rest_client_do_api_call() failed, err -111

#define ETIMEDOUT 116		/* Connection timed out */
#define ECONNREFUSED 111	/* Connection refused */

In our firmware, we've tried implementing a retry cycle, but it hasn't been effective. The modem keeps encountering this error repeatedly. The only solution seems to be rebooting the device.

We don't think the problem lies with our backend because we've encountered the same error when trying to connect to api.nfrcloud.com.

This issue has occurred on two different networks in two different countries and the main point it always disappear after rebooting. 

Could you please review the trace collected from the modem? Thank you.

# Networking
CONFIG_NETWORKING=y
CONFIG_NET_SOCKETS=y
CONFIG_NET_SOCKETS_OFFLOAD=y
CONFIG_NET_SOCKETS_POSIX_NAMES=y
CONFIG_NET_NATIVE=n

#Rest client config
CONFIG_REST_CLIENT=y

CONFIG_HTTP_CLIENT=y


The modem firmware is 1.3.4
SDK 2.0.2

  • Hi  

    I will try to collect the full modem traces but the issue is that I am sure when I enable the 

    RF_MODEM_LIB_TRACE_LEVEL_FULL - it missed huge amount of the traces not sure why.

    Here is my debug configuration:

    CONFIG_LOG=y
    CONFIG_LOG_MODE_IMMEDIATE=y
    CONFIG_LOG_BACKEND_RTT=y
    
    CONFIG_CONSOLE=y
    CONFIG_RTT_CONSOLE=y
    CONFIG_UART_CONSOLE=n
    
    CONFIG_USE_SEGGER_RTT=y
    
    CONFIG_NRF_MODEM_LIB_TRACE_ENABLED=y
    CONFIG_NRF_MODEM_LIB_TRACE_THREAD_PROCESSING=y
    CONFIG_NRF_MODEM_LIB_TRACE_LEVEL_LTE_AND_IP=y
    CONFIG_NRF_MODEM_LIB_TRACE_MEDIUM_RTT=y
    CONFIG_NRF_MODEM_LIB_TRACE_MEDIUM_RTT_BUF_SIZE=2048
    CONFIG_NRF_MODEM_LIB_SENDMSG_BUF_SIZE=1024
    We collecting traces over RTT. 
    1. Are you opening and closing the TLS connection for each send?

    I am using the rest_client library from sdk as I understood it should be done on the library side. 

    Here is part of the code responsible for the requests:

    static struct cloud_rest_context rest_ctx = {
        .connect_socket = REST_CLIENT_SCKT_CONNECT,
        .keep_alive = false,
        .rx_buf = rx_buf,
        .rx_buf_len = sizeof(rx_buf),
        .fragment_size = 0};
    
    static void init_rest_client_request(
        struct cloud_rest_context const *const rest_ctx,
        struct rest_client_req_context *const req, const enum http_method meth)
    {
        memset(req, 0, sizeof(*req));
    
        req->connect_socket = rest_ctx->connect_socket;
        req->keep_alive = rest_ctx->keep_alive;
    
        req->resp_buff = rest_ctx->rx_buf;
        req->resp_buff_len = rest_ctx->rx_buf_len;
    
        req->sec_tag = 1;
        req->port = sPort;
        req->host = sHost;
        req->tls_peer_verify = REST_CLIENT_TLS_DEFAULT_PEER_VERIFY;
        req->timeout_ms = sDataTimeout;
    
        req->http_method = meth;
    
        (void)cloud_codec_init();
    }
    
    static void sync_rest_client_data(struct cloud_rest_context *const rest_ctx,
                                      struct rest_client_req_context const *const req,
                                      struct rest_client_resp_context const *const resp)
    {
        rest_ctx->status = resp->http_status_code;
        rest_ctx->response = resp->response;
        rest_ctx->response_len = resp->response_len;
        rest_ctx->total_response_len = resp->total_response_len;
    
        rest_ctx->connect_socket = req->connect_socket;
    }
    
    static int do_rest_client_request(struct cloud_rest_context *const rest_ctx,
                                      struct rest_client_req_context *const req,
                                      struct rest_client_resp_context *const resp,
                                      bool check_status_good, bool expect_body)
    {
        int64_t timing;
    
        timing = k_uptime_get();
        int ret = rest_client_request(req, resp);
        timing = k_uptime_get() - timing;
        DLOG_I("Request time:%d sec", timing / 1000);
    
        sync_rest_client_data(rest_ctx, req, resp);
    
        if (ret) {
            DLOG_E("REST client request failed with error code %d", ret);
            return ret;
        } else if (rest_ctx->status == MONTR_CLOUD_HTTP_STATUS_NONE) {
            DLOG_E("REST request endpoint closed connection without reply.");
            return -ESHUTDOWN;
        } else if (check_status_good &&
                   (rest_ctx->status != MONTR_CLOUD_HTTP_STATUS_OK) &&
                   (rest_ctx->status != MONTR_CLOUD_HTTP_STATUS_ACCEPTED) &&
                   (rest_ctx->status != MONTR_CLOUD_HTTP_STATUS_UPDATE_AVALIABLE)) {
            DLOG_E("REST request was rejected. Response status: %d", rest_ctx->status);
            return -EBADMSG;
        } else if (expect_body && (!rest_ctx->response || !rest_ctx->response_len)) {
            DLOG_E("Empty response body: %d", rest_ctx->status);
            return -ENODATA;
        }
    
        return 0;
    }
    
    static void close_connection(struct cloud_rest_context *const rest_ctx)
    {
        int err;
        if (!rest_ctx->keep_alive) {
            (void)cloud_rest_disconnect(rest_ctx);
        }
    }
    
    int cloud_rest_disconnect(struct cloud_rest_context *const rest_ctx)
    {
        if (!rest_ctx) {
            return -EINVAL;
        } else if (rest_ctx->connect_socket < 0) {
            return -ENOTCONN;
        }
    
        int err = close(rest_ctx->connect_socket);
    
        if (err) {
            DLOG_E("Failed to close socket, error: %d", errno);
            err = -EIO;
        } else {
            rest_ctx->connect_socket = -1;
        }
    
        return err;
    }
    
    int rest_send_message(struct cloud_rest_context *const rest_ctx,
                                const char *const json_msg, const char *const url)
    {
        assert(rest_ctx != NULL);
        assert(json_msg != NULL);
    
        int ret;
        struct rest_client_req_context req;
        struct rest_client_resp_context resp;
    
        memset(&resp, 0, sizeof(resp));
        init_rest_client_request(rest_ctx, &req, HTTP_POST);
    
        /* Set payload */
        req.body = json_msg;
        /*Set url*/
        req.url = url;
    
        char *const headers[] = {CONTENT_TYPE_APP_JSON, NULL};
    
        req.header_fields = (const char **)headers;
    
        /* Make REST call */
        ret = do_rest_client_request(rest_ctx, &req, &resp, true, true);
    
        close_connection(rest_ctx);
    
        return ret;
    }


    2. Does the error happen when opening the TLS connection or when sending data over TLS?

    As I understood this log means that it happened during connection 

    <err> rest_client: Failed to connect socket, error: 116

    3. Does the server keep the connection open for a required time?

    I will figure this out but what should be the correct value? 
    Also the one point - some time it also happens when we send request to the nrfCloud for cell tower location. 

    err = nrf_cloud_rest_cell_pos_get(&rest_ctx, &cell_pos_req, &cell_pos_result);

  • Hi  ,

    Do you have any feedback on my trace configuration is it correct, also what about the rest of my answers? 

  • Hi Roman,

    Please see response/comments as below:

    Roman Veselskyi said:
    I will figure this out but what should be the correct value? 

    There is no limit in TCP connection itself that how long it can stay open. In practice there is a TCP connection timeout for idle connections. TCP idle timeout can be in server side, nat, firewall or in any middle box. Typical timeout values are 15 minutes or 30 minutes but it depends on service provider. When the timeout for TCP connections hits the connection is considered as broken. Can’t give any advice for configuring server side.

    Is customer using CONFIG_NRF_MODEM_LIB_TRACE_THREAD_PROCESSING ? Ensure it is disabled, it did not work well.
    NRF_MODEM_LIB_TRACE_LEVEL_FULL is AT%XMODEMTRACE=1,2 , perhaps a different value would be sufficient whilst still outputting less traces?

    AT%XMODEMTRACE=1,2 should be ok here. It has the AT, IP (+ Socket api). Maybe the UART connectivity is not ok and lots of traces missing.

    Regards,

    Naeem

  • Roman Veselskyi said:
    I will try to collect the full modem traces
    Roman Veselskyi said:
    I am collect trace using RTT 

    So you will recollect the modem traces and send it?

  • Related