Sending nRFCloud CoAP while network not available

Hello,

I use nRFCloud to send CoAP messages, and it's working fine under normal conditions.
I am now trying to simulate a network loss, while sending some data.
To do that, I simply remove the antenna of my device when sending.

Here is my test flow :
- Device has successfully sent some CoAP messages. Modem enters PSM and firmware sleeps.
- After 1 minute, the device wakes up. I remove the antenna.
- The firmware tries to send a CoAP message using nrf_cloud_coap_json_message_send().
  After a few seconds, I receive lte_lc_evt LTE_LC_NW_REG_UNKNOWN.
  After a few minutes, it succesfully reconnect to network (eventhough the antenna is remove)
 


Here is a snippet of my log :

[...] Device running properly.
[...] Modem in PSM, firmware in sleep.

* Wakeup *

11/04/2024 07:18:43
Cellular enable ... Ok (already ready).
Sending CoAP Message.
Current PSM [TAU=420 ; AT=2] don't match desired [TAU>1080 ; 2<AT<20].
Request PSM [TAU=1080 ; AT=2]
[MODEM] PSM Update [TAU=1080 ; AT=2]
[MODEM] RCC Idle

* Sleep for 1 minute *

* Wakeup *
[ I REMOVE THE ANTENNA FROM THE DEVICE ]

11/04/2024 07:18:58
Cellular enable ... Ok (already ready).
Sending CoAP Message.
[MODEM] Network registration - LTE_LC_NW_REG_UNKNOWN
[MODEM] Mode: Off
[MODEM] Searching network
[MODEM] Cell - id:138275874, tac:22582
[MODEM] Mode NB-IoT
[MODEM] RRC Connected
* REBOOT FROM WATCHDOG * 




From what I understand, the function nrf_cloud_coap_json_message_send() is blocking. Since the device can't reach the network, the firmware waits until the connection is re-established.
Is there a way to abort sending if the modem is no loger connected ? Note that this all from the main, I don't use any threads for this.

What I find strange, is that the watchdog is not triggered even if nrf_cloud_coap_json_message_send() hangs for several minutes. My watchdog is setup with a 10 sec window, and I know it works.

Am I missing something ?

Thanks.

Parents
  • Concerning the WDT, it might be due to the fact I used the WDT_OPT_PAUSE_IN_SLEEP option.When I remove it, the watchdog does indeed reboots the device when it's hanging at nrf_cloud_coap_json_message_send() with no network.

  • Vincent44 said:
    Concerning the WDT, it might be due to the fact I used the WDT_OPT_PAUSE_IN_SLEEP option.When I remove it, the watchdog does indeed reboots the device when it's hanging at nrf_cloud_coap_json_message_send() with no network.

    Yes, that would explain it. Regarding abort the sending, then maybe you could try nrf_cloud_coap_pause().

  • This still requires having a timer/WDT to detect that nrf_cloud_coap_json_message_send() is stuck.

    I guess it also implies using different threads.

    Being able to hand a timeout as parameter to nrf_cloud_coap_json_message_send() would be a nive feature.

  • So, I tried adding a timer in order to stop the process in case it is stuck.

    The idea is to be able to pause/stop the sending, resulting in nrf_cloud_coap_json_message_send() to return an error code.

    The code looks like this:

    struct k_timer preventBeingStuckTimer ;
    void preventBeingStuck_TimerCallback( struct k_timer * a)
    {	
    	nrf_cloud_coap_pause() ;
    	k_timer_stop(&preventBeingStuckTimer) ;
    }
    
    
    bool COAP_Send(char * msg, uint16_t msgLen)
    {	
    	bool success ;
    	int err ;	
    
    	k_timer_start(&preventBeingStuckTimer, Z_TIMEOUT_MS(5000), Z_FOREVER ) ;
    
    	err = nrf_cloud_coap_json_message_send(msg) ;
    
    	if(err == 0){
    		success = true ;
    	}
    	else
    	{
    		// Negative values are device-side errors defined in errno.h.
    		// Positive values are cloud-side errors (CoAP result codes) defined in zephyr/net/coap.h. 
    		if(err < 0){
    			DEBUG_PRINT("Error while sending (Device err: %d) !\r\n", err) ;
    		}else{
    			DEBUG_PRINT("Error while sending (Cloud err: %d) !\r\n", err) ;
    		}
    		success = false ;
    	}
    
    	k_timer_stop(&preventBeingStuckTimer) ;
    	
    	return success ;
    }

    The timer triggers, but the execution hangs at nrf_cloud_coap_pause().

    My guess is that the nrfcloud_coap operations can't be pause in case it is currently trying to send/receive data.

    I also tried nrf_cloud_coap_disconnect(), which I expected to be more brutal ... but I got the same results.

    During my test, I also had similar issue with nRFCloud FOTA functions. I am not 100% sure that it was the exact same issue, as I was not in debug mode, but I noticed some watchdog reboot while my device was checking for FOTA nrf_cloud_rest_fota_job_get() at location where network quality was low.

    How can I do to abort the process once it has been started ?

    As an extra safety, I can check that the device is still connected to the network before sending data. However, this won't cover all cases, as disconnect can occur anytime.

Reply
  • So, I tried adding a timer in order to stop the process in case it is stuck.

    The idea is to be able to pause/stop the sending, resulting in nrf_cloud_coap_json_message_send() to return an error code.

    The code looks like this:

    struct k_timer preventBeingStuckTimer ;
    void preventBeingStuck_TimerCallback( struct k_timer * a)
    {	
    	nrf_cloud_coap_pause() ;
    	k_timer_stop(&preventBeingStuckTimer) ;
    }
    
    
    bool COAP_Send(char * msg, uint16_t msgLen)
    {	
    	bool success ;
    	int err ;	
    
    	k_timer_start(&preventBeingStuckTimer, Z_TIMEOUT_MS(5000), Z_FOREVER ) ;
    
    	err = nrf_cloud_coap_json_message_send(msg) ;
    
    	if(err == 0){
    		success = true ;
    	}
    	else
    	{
    		// Negative values are device-side errors defined in errno.h.
    		// Positive values are cloud-side errors (CoAP result codes) defined in zephyr/net/coap.h. 
    		if(err < 0){
    			DEBUG_PRINT("Error while sending (Device err: %d) !\r\n", err) ;
    		}else{
    			DEBUG_PRINT("Error while sending (Cloud err: %d) !\r\n", err) ;
    		}
    		success = false ;
    	}
    
    	k_timer_stop(&preventBeingStuckTimer) ;
    	
    	return success ;
    }

    The timer triggers, but the execution hangs at nrf_cloud_coap_pause().

    My guess is that the nrfcloud_coap operations can't be pause in case it is currently trying to send/receive data.

    I also tried nrf_cloud_coap_disconnect(), which I expected to be more brutal ... but I got the same results.

    During my test, I also had similar issue with nRFCloud FOTA functions. I am not 100% sure that it was the exact same issue, as I was not in debug mode, but I noticed some watchdog reboot while my device was checking for FOTA nrf_cloud_rest_fota_job_get() at location where network quality was low.

    How can I do to abort the process once it has been started ?

    As an extra safety, I can check that the device is still connected to the network before sending data. However, this won't cover all cases, as disconnect can occur anytime.

Children
Related