nRF9160 can't connect to working server

Hello,

I've been developing code on nRF9160 for a while. In the beginning everything was straight forward and working just fine also the code was pretty simple. Now like 70% of the time when I try to connect to a server (which I know is working and network is also just fine) the modem hangs and cant connect.

In past months I noticed that the modem very rarely gets in some weird state and it's not able to connect to network anymore - I found some other members in DevZone have noticed the same issue, but that's kinda okay since it was very rarely. In past week of extensive testing I have noticed that `connect()` from socket.h just do not want to connect to server anymore. I pinned down that `retval = nrf_connect(sd, (struct nrf_sockaddr *)&ipv4, sizeof(struct nrf_sockaddr_in));' from nrf91_sockets.c line 476 and 477 is the line where the code hangs. I can't step in any deeper in this function since visual studio isn't able to fine the `nrf_connect()` function 

I don't see any logs on server side so I believe that the issue is somewhere in-between my code and modem? Anyway it is getting really annoying and I'm not sure how to resolve this issue. The error returned by the `connect()` call after 2 minute timeout is: "Failed to connect socket, error: 114". Error # 114 responds to no network available but I know for sure that network is available. I strongly believe that the issue is not with the network but modem itself because "LTE cell changed: Cell ID: 2808860, Tracking area: 41120" notifications come in pretty often and the RSRP, RSRQ and SNR values are -98 ≤ RSRP < -97 dBm, -14 ≤ RSRQ < -13.5 dB, 13 dB ≤ SNR < 14 dB just before calling the `connect()` function. Are these parameters okay to establish an UDP connection? 

A bit more context about the project - at startup nRF9160 creates a CoAP + DTLS socket and connects to our server. Afterwards it just waits for an input on UART1 (from a different MCU) and when data are received LTE module encodes the data and sends it to server. Of course there are some additional peripherals running like timers, watchdog and basically that's it. 

SDK and toolchain version 2.6.0, modem firmware version mfw_nrf9160_1.3.6

Edit: The longer I think about this issue, the more I start to believe that it could be network problem. I started working from a new place 2 weeks ago, and one week after moving I continued doing stability tests on our nRF9160 controlled device. And also this is the time when I started noticing the connection issues. I did not want to believe that it is network related since I am still in a city center and the network coverage is good for 4G, also the LTE-M coverage map shows that the device should be in zone but more and more it seems like that it really is signal issue..

Parents Reply Children
  • I only know that it runs on AWS IoT core.

    But I managed to find the issue. For some reason the RRC is not connected when the device tries to create a socket. I noticed this just this morning when also the messages did not get delivered to server. It seems like a weird bug when `connect()` and `send()` does not change the RRC mode (RRC stays in idle) although the socket opening and CoAP packet creation logic has not changed (only some logging and double checks added and that's it). As a template I took this project and just added the stuff I needed for logging. I reverted the code and now all runs smoothly. 

    Anyway thanks for your input, much appreciated! Now I need to pinpoint what exactly causes this to be able to fix it in code. If by any chance you have seen something similar and have idea which peripheral or setting might cause this, let me know! 

  • Hi, did you resolve the issue in the meantime or found a workaound? I can confirm it in our firmware. I think a nrf_modem lib update could be a reason, since for me it seem it has been introduced somehow after updating it to 2.6.0 or 2.6.1.

  • Hello,
    Yes we did manage to fix the issue! It took a while of testing but eventually we found out that IP address of IoT device was changing every 5ish minutes. Good that we had two different provider SIM's and we noticed that all was good with one SIM, but nothing worked for the other - hence ones IP was static, others dynamic. To address this issue, we needed to pass the CID to the server so it saves it with every new session. Whenever a new packet is received, the server checks if the CID is the same instead of checking the IP address.

    This piece of code fixed the issue and we don't have any of these problems since:

        int dtls_cid = NRF_SO_SEC_DTLS_CID_ENABLED;
        err = nrf_setsockopt(client_fd, NRF_SOL_SECURE, NRF_SO_SEC_DTLS_CID, &dtls_cid, sizeof(&dtls_cid));
        if(err)
        {
            LOG_ERR("Failed to setup socket security tag, errno %d\n", errno);
            return -errno;
        }
    

  • Congratulations! Therefore we developed and specified RFC9146 .

    Just one remark from a couple of years with CID: a common "left pain" is the missing "graceful dtls restart". Without that, a restarting server endpoint (e.g. for server updates), requires a handshake again.

    Still not sure, which server you are using. If it's Eclipse/Californium , the dtls-graceful-restart is available.

  • We will try PION DTLS 3.0 these days which should have grown on CID support, at least what the release notes say.

Related