nRFCloud CoAP connection timeout

Hello DevZone,

I am currently trying to fix an issue I have on deployed devices, using nRF9160 with nRFCloud. Code is based on SDK 2.8.0.

In most cases, my code works fine and there is no issue. However, I sometimes have the following issue.

When trying to connect to nRFCloud using nrf_cloud_coap_connect(), the connection can take some time ... so long that it trigger my watchdog, which is setup to 75 seconds !

In the field, I identified two cases when this occurs:

  • When the device has bad network coverage. In that case, my theory is that some data packages get lost during the process, and some retry mecanism takes a long time.
  • When nRFCloud is down. This occured for example on October 20th  2025, leading thousands of my devices to reboot every 75 seconds (due to watchdog), for several hours. That was not good for batteries ...

The first approach to solve this is to simply increase the watchdog timing. However:

 1 - It's already using a rather large duration, due to other specific actions. My overall goal is actually to try to reduce this value.
 2 - I can't find any info regarding the maximum execution duration of nrf_cloud_coap_connect(). Is it possible to get this info ?

The second way to solve this issue, is to be able to abort a connection attempt.
I already tried to setup a timer and start it just before nrf_cloud_coap_connect().
When the timer fires (ex, after 60 sec)or example), it call nrf_cloud_coap_disconnect() an/or (tried both)  nrf_cloud_coap_init(). But this doesn't abort the connection.

Is there a way to abort a connection attempt ? If so, how can I do it ?

Thanks.

Parents
  • Hi Vincent!

    Thanks for reaching out.

    First of all: I think you should try to refactor your code so that that the Watchdog isn't starved when the device cannot connect. As you have observed connection issues isn't necessarily caused by problems that requires a reboot, so when such problems occurs it's better to handle them through other timeout mechanisms. Which leads me to answering your actual question: from what I know the nrf_cloud_coap_connect()function cannot be aborted through the API, but it should eventually timeout if a connection cannot be established. This can take quite some time because of the retry mechanisms as you say, but these should be possible to configure. You can read about some of the different options here. Initially you could for example try to lower CONFIG_COAP_MAX_RETRANSMIT? Its default value is 4.

    Alternatively you could try to run nrf_cloud_coap_connect in it's own thread and abort this after a timeout configured by you. This will probably require some cleanup afterwards by closing sockets and such.

    Finally, to avoid trying to connect if the coverage is bad it's considered good practice to use the %CONEVAL AT-command to get an evaluation of the LTE-connection that can be used to decide whether it's worth attempting a transmission or not. You could use this before calling nrf_cloud_coap_connect

    I hope some of this helps!

    Best regards,
    Carl Richard

Reply
  • Hi Vincent!

    Thanks for reaching out.

    First of all: I think you should try to refactor your code so that that the Watchdog isn't starved when the device cannot connect. As you have observed connection issues isn't necessarily caused by problems that requires a reboot, so when such problems occurs it's better to handle them through other timeout mechanisms. Which leads me to answering your actual question: from what I know the nrf_cloud_coap_connect()function cannot be aborted through the API, but it should eventually timeout if a connection cannot be established. This can take quite some time because of the retry mechanisms as you say, but these should be possible to configure. You can read about some of the different options here. Initially you could for example try to lower CONFIG_COAP_MAX_RETRANSMIT? Its default value is 4.

    Alternatively you could try to run nrf_cloud_coap_connect in it's own thread and abort this after a timeout configured by you. This will probably require some cleanup afterwards by closing sockets and such.

    Finally, to avoid trying to connect if the coverage is bad it's considered good practice to use the %CONEVAL AT-command to get an evaluation of the LTE-connection that can be used to decide whether it's worth attempting a transmission or not. You could use this before calling nrf_cloud_coap_connect

    I hope some of this helps!

    Best regards,
    Carl Richard

Children
  • Thanks for the quick and detailled response.

    These are more or less the options I also had in mind.

    Is there a way to determine the maximum timeout duration of nrf_cloud_coap_connect() ?

    So far, I implemented a timer that periodically feeds the watchdog during the connection process. But nothing prevent that to hang forever ...

  • Hi again Vincent!

    You are welcome.

    I don't know of any straight forward way to determine the maximum timeout duration of nrf_cloud_coap_connect(), but It should be tied to the configs I mentioned in my previous message. If you are able to test the device in an environment where the nrf_cloud_coap_connect()-function hangs, it should be possible to experiment a bit and see what determines the timeout.

    I will see if I can manage to run some tests here as well.

    Best regards,
    Carl Richard

Related