nRFCloud COAP send error ETIMEDOUT

Hello,

I have a firmware that sends every 15 min a packet to nRFCloud using nrf_cloud_coap_json_message_send().

The code is working great most of the time : hundreds of devices deployed, but very few cases with the issue reported here.

Sometimes, calling nrf_cloud_coap_json_message_send() returns error -116 ETIMEDOUT. I was unable to find any details on this error and how to fix it.

It really seems to occurs at a random moment, without any noticable fact compared to the standard case : no network issue, the devices uses PSM, authenfication to nRFCloud is no reset, ...

Following this error, any attempt to reach nRFCloud fails, with error -111. I beleive this is simply due to the fact that the firmware should adapt to the initial ETIMEDOUT error, and fix something before trying to send any other data.

Thanks.

Vincent

Parents
  • Hi,

    Which version of the SDK do you use?

    Following this error, any attempt to reach nRFCloud fails, with error -111.

    Are you able to get a modem trace from this?

    Does anything help to reach nRF Cloud again, such as a device reset?

    Regards,
    Sigurd Hellesvik

  • Hello,

    This issue occured on devices already deployed, which I was able to retreive some log from.

    I never experience the same issue when debugging the firmware, so I couldn't get any modem trace.

    When the device resets, it goes back to working fine.

    Note that we are talking about agricultural sensors, which are installed in field and don't move. So LTE-M/NB-ioT coverage shouldn't be a problem since the device were working before and after the issue.

    Thanks.

  • Which version of the SDK do you use?

    I will check if we can get some logs on the -111 (ECONNREFUSED) error. Do you agree that this is the main error that we should fix, cause the ETIMEDOUT is not that bad if the device can reconnect after?

    Sometimes, calling nrf_cloud_coap_json_message_send() returns error -116 ETIMEDOUT. I was unable to find any details on this error and how to fix it.

    Maybe you can try with a test device to force nrf_cloud_coap_json_message_send() to get a timeout in some way. If this can reproduce the ETIMEDOUT error, then we can get a modem trace. What do you think about this?

  • I am using SDK 2.8.0 with modem 1.3.6.

    I agree that the most important here is ETIMEDOUT. Getting an error code is ok, as long as there are explanations on the error and how to fix it.

    For a lot of nrf_cloud() functions, the documentation simply states this for the return value :  "0 if initialization was successful, otherwise, a negative error number."

    The main issue here is that I can't reproduce the error when debugging, so I have no hint on how to solve it.

    Just to be more clear, here is a snippet of what my firmware does :

    bool needReconnectTonRFCloud = true ;
    while(1)
    {
    	// Take a sensor measurement for a sensor
    	[...]
    	
    	
    	// Ensure the device is connected to network. If not, re-connect
    	if( lte_lc_nw_reg_status_get(&status) == 0)
    	{
    		if( (status == LTE_LC_NW_REG_REGISTERED_HOME) || (status == LTE_LC_NW_REG_REGISTERED_ROAMING) )
    		{
    			// Device is connected, we can continue
    		}
    		else
    		{
    			// Restart a connection and wait for the device to be registered on network
    			[...]
    			needReconnectTonRFCloud = true ;
    		}
    	}
    		
    
    	// Reconnect to nRFCloud, if needed (after reboot, network connection lost, ...)
    	if(needReconnectTonRFCloud)
    	{
    		nrf_cloud_coap_disconnect() ;	// Just to be sure
    		err = nrf_cloud_coap_connect(NULL) ;
    		if(err == 0)
    		{
    			needReconnectTonRFCloud = false ;
    		}
    		else
    		{
    			printk("Error %d", err) ;	 //  << After receiving the ETIMEDOUT error (see below), all calls to nrf_cloud_coap_connect fail with error ECONNREFUSED.
    		}
    	}
    	
    	// Send the sensor measurement
    	err = nrf_cloud_coap_json_message_send(...)
    	if(err == 0)
    	{
    		printk("Data sent successfully") ;
    	}
    	else
    	{
    		printk("Error sending %d", err) ;	// << This is where I get the initial ETIMEDOUT error
    		needReconnectTonRFCloud = true ;
    	}
    
    
    	// Ensure PSM/eDRX settings are ok for my application
    	[...]
    	
    	// Sleep for 15 min
    	k_timer_start(&wakeupTimer, Z_TIMEOUT_MS(15*60*1000), Z_FOREVER ) ;
    	k_cpu_idle() ;
    }

    Here is the log I get for the device. Working as expected until line 55.

    24/04/2025 13:14:21
    Measurement: 15.0°C 3755 Ohms ~21 
    Modem already ready
    1 pending frame
    Send: {"F":{"1745500461":"190110EAB898"}}
    [MODEM] Search done
    [RRC] Connected
    Send ok
    
    24/04/2025 13:14:22
    Delta read nRF shadow > No new data
    Next READ: 60 min
    
    24/04/2025 13:14:23
    Next frame: 900s
    PSM TAU=5040 AT=0 eDRX=2621: Ok
    Sleep
    [RCC] Idle
    
    24/04/2025 13:29:21
    Measurement: 15.2°C 3755 Ohms ~21 
    Modem already ready
    1 pending frame
    Send: {"F":{"1745501361":"190110EAB8A0"}}
    [MODEM] Search done
    [RRC] Connected
    Send ok
    
    24/04/2025 13:29:24
    Next frame: 900s
    PSM TAU=5040 AT=0 eDRX=2621: Ok
    Sleep
    [RCC] Idle
    
    24/04/2025 13:44:21
    Measurement: 15.5°C 3784 Ohms ~21 
    Modem already ready
    1 pending frame
    Send: {"F":{"1745502261":"190110EC88AC"}}
    [MODEM] Search done
    [RRC] Connected
    Send ok
    
    24/04/2025 13:44:24
    Next frame: 900s
    PSM TAU=5040 AT=0 eDRX=2621: Ok
    Sleep
    [RCC] Idle
    
    24/04/2025 13:59:21
    Measurement: 15.7°C 3761 Ohms ~21 
    Modem already ready
    1 pending frame
    Send: {"F":{"1745503161":"190110EB18B0"}}
    Send error (Device err: -116)
    Frame(s) stored back to the queue.
    
    24/04/2025 13:59:26
    Next frame: 900s
    PSM TAU=5040 AT=0 eDRX=2621: Ok
    Sleep
    
    24/04/2025 14:14:21
    Measurement: 15.9°C 3704 Ohms ~21 
    Modem already ready
    nRFCloud connect > [MODEM] Search done
    [RRC] Connected
    Error (-111)
    
    24/04/2025 14:15:21
    Next frame: 900s
    PSM TAU=5040 AT=0 eDRX=2621: Ok
    Sleep
    [RCC] Idle
    
    24/04/2025 14:29:22
    Measurement: 16.0°C 3732 Ohms ~21 
    Modem already ready
    nRFCloud connect > [MODEM] Search done
    [RRC] Connected
    Error (-111)
    
    24/04/2025 14:30:22
    Next frame: 900s
    PSM TAU=5040 AT=0 eDRX=2621: Ok
    Sleep
    [RCC] Idle
    
    24/04/2025 14:44:23
    Measurement: 16.2°C 3744 Ohms ~21 
    Modem already ready
    nRFCloud connect > [MODEM] Search done
    [RRC] Connected
    Error (-111)
    
    24/04/2025 14:45:23
    Next frame: 900s
    PSM TAU=5040 AT=0 eDRX=2621: Ok
    Sleep
    [RCC] Idle
    
    24/04/2025 14:59:24
    Measurement: 16.4°C 3698 Ohms ~21 
    Modem already ready
    nRFCloud connect > [MODEM] Search done
    [RRC] Connected
    Error (-111)
    
    24/04/2025 15:00:24
    Next frame: 900s
    PSM TAU=5040 AT=0 eDRX=2621: Ok
    Sleep
    [RCC] Idle
    
    
    [ Same error -111 continuously, until the device is reboot ]

  • Vincent44 said:
    I agree that the most important here is ETIMEDOUT.

    I mean the other way around.
    The first disconnect is from ETIMEDOUT.
    The "can never connect again" is from ECONNREFUSED.
    Sporadic errors, such as ETIMEDOUT, are not that bad if normal operation can be resumed automatically.
    Therefore ECONNREFUSED is the most important.  Do you agree with this?

    Lets have a look at your code:

    	// Send the sensor measurement
    	err = nrf_cloud_coap_json_message_send(...)
    	if(err == 0)
    	{
    		printk("Data sent successfully") ;
    	}
    	else
    	{
    		printk("Error sending %d", err) ;	// << This is where I get the initial ETIMEDOUT error
    		needReconnectTonRFCloud = true ;
    	}

    What happens if you at the test device adds the following line "err = 1", as such:

    	// Send the sensor measurement
    	err = nrf_cloud_coap_json_message_send(...)
    	err = 1; // NEW LINE TO TEST RECONNECTION 
    	if(err == 0)
    	{
    		printk("Data sent successfully") ;
    	}
    	else
    	{
    		printk("Error sending %d", err) ;	// << This is where I get the initial ETIMEDOUT error
    		needReconnectTonRFCloud = true ;
    	}

    Then perhaps this can force the ECONNREFUSED error?

Reply
  • Vincent44 said:
    I agree that the most important here is ETIMEDOUT.

    I mean the other way around.
    The first disconnect is from ETIMEDOUT.
    The "can never connect again" is from ECONNREFUSED.
    Sporadic errors, such as ETIMEDOUT, are not that bad if normal operation can be resumed automatically.
    Therefore ECONNREFUSED is the most important.  Do you agree with this?

    Lets have a look at your code:

    	// Send the sensor measurement
    	err = nrf_cloud_coap_json_message_send(...)
    	if(err == 0)
    	{
    		printk("Data sent successfully") ;
    	}
    	else
    	{
    		printk("Error sending %d", err) ;	// << This is where I get the initial ETIMEDOUT error
    		needReconnectTonRFCloud = true ;
    	}

    What happens if you at the test device adds the following line "err = 1", as such:

    	// Send the sensor measurement
    	err = nrf_cloud_coap_json_message_send(...)
    	err = 1; // NEW LINE TO TEST RECONNECTION 
    	if(err == 0)
    	{
    		printk("Data sent successfully") ;
    	}
    	else
    	{
    		printk("Error sending %d", err) ;	// << This is where I get the initial ETIMEDOUT error
    		needReconnectTonRFCloud = true ;
    	}

    Then perhaps this can force the ECONNREFUSED error?

Children
No Data
Related