Sending nRFCloud CoAP while network not available

Hello,

I use nRFCloud to send CoAP messages, and it's working fine under normal conditions.
I am now trying to simulate a network loss, while sending some data.
To do that, I simply remove the antenna of my device when sending.

Here is my test flow :
- Device has successfully sent some CoAP messages. Modem enters PSM and firmware sleeps.
- After 1 minute, the device wakes up. I remove the antenna.
- The firmware tries to send a CoAP message using nrf_cloud_coap_json_message_send().
  After a few seconds, I receive lte_lc_evt LTE_LC_NW_REG_UNKNOWN.
  After a few minutes, it succesfully reconnect to network (eventhough the antenna is remove)
 


Here is a snippet of my log :

[...] Device running properly.
[...] Modem in PSM, firmware in sleep.

* Wakeup *

11/04/2024 07:18:43
Cellular enable ... Ok (already ready).
Sending CoAP Message.
Current PSM [TAU=420 ; AT=2] don't match desired [TAU>1080 ; 2<AT<20].
Request PSM [TAU=1080 ; AT=2]
[MODEM] PSM Update [TAU=1080 ; AT=2]
[MODEM] RCC Idle

* Sleep for 1 minute *

* Wakeup *
[ I REMOVE THE ANTENNA FROM THE DEVICE ]

11/04/2024 07:18:58
Cellular enable ... Ok (already ready).
Sending CoAP Message.
[MODEM] Network registration - LTE_LC_NW_REG_UNKNOWN
[MODEM] Mode: Off
[MODEM] Searching network
[MODEM] Cell - id:138275874, tac:22582
[MODEM] Mode NB-IoT
[MODEM] RRC Connected
* REBOOT FROM WATCHDOG * 




From what I understand, the function nrf_cloud_coap_json_message_send() is blocking. Since the device can't reach the network, the firmware waits until the connection is re-established.
Is there a way to abort sending if the modem is no loger connected ? Note that this all from the main, I don't use any threads for this.

What I find strange, is that the watchdog is not triggered even if nrf_cloud_coap_json_message_send() hangs for several minutes. My watchdog is setup with a 10 sec window, and I know it works.

Am I missing something ?

Thanks.

Parents Reply Children
  • This still requires having a timer/WDT to detect that nrf_cloud_coap_json_message_send() is stuck.

    I guess it also implies using different threads.

    Being able to hand a timeout as parameter to nrf_cloud_coap_json_message_send() would be a nive feature.

  • So, I tried adding a timer in order to stop the process in case it is stuck.

    The idea is to be able to pause/stop the sending, resulting in nrf_cloud_coap_json_message_send() to return an error code.

    The code looks like this:

    struct k_timer preventBeingStuckTimer ;
    void preventBeingStuck_TimerCallback( struct k_timer * a)
    {	
    	nrf_cloud_coap_pause() ;
    	k_timer_stop(&preventBeingStuckTimer) ;
    }
    
    
    bool COAP_Send(char * msg, uint16_t msgLen)
    {	
    	bool success ;
    	int err ;	
    
    	k_timer_start(&preventBeingStuckTimer, Z_TIMEOUT_MS(5000), Z_FOREVER ) ;
    
    	err = nrf_cloud_coap_json_message_send(msg) ;
    
    	if(err == 0){
    		success = true ;
    	}
    	else
    	{
    		// Negative values are device-side errors defined in errno.h.
    		// Positive values are cloud-side errors (CoAP result codes) defined in zephyr/net/coap.h. 
    		if(err < 0){
    			DEBUG_PRINT("Error while sending (Device err: %d) !\r\n", err) ;
    		}else{
    			DEBUG_PRINT("Error while sending (Cloud err: %d) !\r\n", err) ;
    		}
    		success = false ;
    	}
    
    	k_timer_stop(&preventBeingStuckTimer) ;
    	
    	return success ;
    }

    The timer triggers, but the execution hangs at nrf_cloud_coap_pause().

    My guess is that the nrfcloud_coap operations can't be pause in case it is currently trying to send/receive data.

    I also tried nrf_cloud_coap_disconnect(), which I expected to be more brutal ... but I got the same results.

    During my test, I also had similar issue with nRFCloud FOTA functions. I am not 100% sure that it was the exact same issue, as I was not in debug mode, but I noticed some watchdog reboot while my device was checking for FOTA nrf_cloud_rest_fota_job_get() at location where network quality was low.

    How can I do to abort the process once it has been started ?

    As an extra safety, I can check that the device is still connected to the network before sending data. However, this won't cover all cases, as disconnect can occur anytime.

  • Hi,

    I got some input from the developers on this issue. We are aware of the issue; we have many improvements about to merge to NCS to improve reliability, including the abililty to detect and recover from network outages. These improvments will be part of the upcoming NCS v2.7.0

  • Hello Sigurd,

    Thanks for your reply.

    Do you have an approximate release date for v2.7.0? Are we talking days, week, months ?

  • Vincent44 said:
    Do you have an approximate release date for v2.7.0? Are we talking days, week, months ?

    ~End of June timeframe

Related