https Connect() timeout, 116

Hello, 

Hoping that someone can help with some possible leads to further debug this issue. I'm new to the cellular world in general so any help in the right direction would be great. A little background:

Modem FW: v1.3.4

SDK Version: v2.4.2

Modem: nrf9160 SICA

Location: USA

SIM: Soracom

Custom Board

Overview of the App

We are code complete on the firmware and are in the process of testing it. The application is based on the REST FOTA example on top of which we have build some addition functionality to periodically send https request to our mongoDB instance. We are using code for the https that is also loosely based on the https sample for the nrf9160. The app tries to keep the threading to a minimum, we use the main thread with a 100mS sleep for most of the processing, however we do use work queue threads for sending our https request to mongo as these can block and take multiple seconds to complete.

We do not currently have eDRX or PSM enabled and are using LTE-M with NB-IoT Fallback. 

The issue we are seeing relates to the use of the connect() POSIX function. We use the same function for all of our various API calls to MongoDB. Here is a snippet:

/* Setup TLS socket options */
    err = tls_setup();
    if (err) 
    {
        LOG_ERR("https_send_messag() Failed to set up TLS!");
        cleanUp();
        return (-4);
    }
    
    LOG_DBG("https_send_messag() Connecting to %s", HTTPS_HOSTNAME);
    time_t ref_time = k_uptime_get(); //Get the current time so we can calculate the time it takes to connect
    err = connect(https_c.fd, https_c.res->ai_addr, sizeof(struct sockaddr_in));
    if (err) {
        LOG_ERR("connect() failed, err: %d after: %lld mS", errno, k_uptime_delta(&ref_time));

        if(errno == ETIMEDOUT)
        {
            https_c.connect_timeouts++;
            LOG_WRN("Total connect timeouts: %d", https_c.connect_timeouts);
        }
        cleanUp();
        return (-5);
    }

    LOG_INF("https_send_messag() connected to %s in: %lld mS", HTTPS_HOSTNAME, k_uptime_delta(&ref_time));
 

Problem Overview 

Most of the time this works fine and we typically see a connection time <= 3seconds. However depending on the network, time of day, and who know what else we will get a string of timeout errors (116) when calling connect(). Once they start happening they seem to continue. Even after power cycling and/or reset the timeouts seem to continue, which make me think that this is at least partly LTE network related.

I have attempt to increase the NET_SOCKETS_CONNECT_TIMEOUT to 30Sec (although it doesn't seem to allow more than 10Sec). this help's a little, but we will still frequency see strings of timeouts over the 10sec mark. I'm not sure if this is relevant but we also typically see a string of RCC events from the lte_event_handler cycling between CONNECT and IDLE around the same time we see the connect() timeout. 

 

If the application is allowed to just run, we eventually see this go away and then have multiple hours of operations without a single timeout.  Additionally, we have another test location in a different state where we have also seem the same behavior. 

We have tried to collect a cellular trace without success, We only have uart0 and the RTT interface both of which we also use for logging. But we may take another shot at this.

Any advice on potential debugging steps would be great.  

Thanks!

Parents Reply Children
Related