This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

cloud_connect() stuck in 22nd connection attempts (using AWS_IOT backend)

Hello,

I am on nrf-sdk v1.3.0 and mwf v1.2.0.

We are updating our product to the above mentioned nrf-sfk version and have noticed that MQTT connection gets stuck after 21 successful connections. I have produced a minimal example of the issue that I am sharing with you here.

In our actual firmware, we connect only every once in a while (from once per day to once per week), so this has only become apparent recently.

The test loop is quite simple:

while (1)
{
    lte_lc_connect();
    LOG_INF("connected to lte");

    cloud_connect(cloud_backend);
    k_sem_give(&cloud_conn_sem); // this activates the polling loop
    LOG_INF("connected to cloud");
    k_sleep(K_SECONDS(5));
    cloud_disconnect(cloud_backend);
    close(cloud_backend->config->socket);
    LOG_INF("disconnected from cloud");

    // go offline
    lte_lc_offline();
    LOG_INF("went offline");

    LOG_INF("test loop %d", loop_c++);
}

The code gets stuck in cloud_connect(cloud_backend);
The last thing printed is "IPv4 Address found 18.185.186.31", which appears here in nrf-sdk v1.3.0 in aws_iot.c, line 557.
Looking through the code, the next function called is then at line 711 here.

We had already encountered something similar in march, when we were testing nrf v.1.2.0 - See this bifravst issue:
https://github.com/bifravst/firmware/issues/30
We were stuck in the same place then, but it happened every time.

We need this to work, or, if not possible, a way to determine this has happened and recover from it.
I have attempted to call cloud_connect in a separate thread and then, if connect is stuck for more then 30 seconds, try to abort the thread and re-init the cloud backend. But this then seems to break the AT client, so I can't recover except via reboot.

I am attaching a modem trace, an otti file with power consumption and uart logs, as well as the example source code (it is reduced out of our whole project code).
Code execution gets stuck at around 9:33 in the otti file, and probably a second or two later in the trace.

 cloud_connect_bug_trace_code.zip

Any help would be apreciated.
Regards,
- Tjaž

Parents Reply Children
Related