nRFCloud CoAP messages not received

Hello DevZone


I am currently facing an issues with device already installed at my customer.
Devices are based on nRF9160, firmware based on SDK 2.8.0 and modem 1.3.6.
Once deviced are installed, they do not move.

Device works like this:
Sleep 15 min
Wakeup
Take a measurement
If modem was switched off (no PSM):
    -> Reconnect to nRFCloud: 

nrf_cloud_coap_disconnect() ;	// From any previous sesssio. Just to be sure
nrf_cloud_coap_init() ;
err = nrf_cloud_coap_connect(FIRMWARE_VERSION) ;

Send sensor measurement to nRFCloud via CoAP
If PSM is available:
    -> Go IDLE / PSM
If not:
    -> Switch off modem to disconnect from network


I have ~4000 units deployed with similar HW and FW, and everything is working as expected.
However, I have two devices facing the same problem: the messages sent by the devices are only partially received on nRFCloud. 
(there might be more device, but not detected yet)


The two reported cases share one common thing : the MNO doesn't offer PSM with timings acceptable for the device operation.
Therefore, the device disconnects every time from the network after sending.


One device is located in Martinique (French Antille) and has access to two LTE-M networks: Orange and SFR. Both with good coverage (ConEval and RSRP are good), and not providing PSM.
If the device connects to SFR, everything works fine. However, if it connects to Orange, many messages are lost. It also seems that the more time, the more loss.
At initially though that it was a "bad network", or a "bad antenna" or some restriction between my MNVO and the local provider.
Then, the second case occured, in Italy.
The device connects to Vodafone in LTE-M, no PSM.
Looking at the device log, all messages are sent properly.
On the SIM card side, the amount of data consumed and the connect/disconnect also show a normal behavior.
However, not all messages are received on nRFCloud.
This time, since I have several other devices connected to the same network (but not the same cell), so I am quite confident that there is no MVNO/local operator issue.



CoAP messages are sent using

nrf_cloud_coap_json_message_send(msg, false, false)
, that always return 0. Payload is pretty small, with msg being a string with approx 20 chars.
I understand that using the confirmable feature would be more secure. This has not been done so far, as the firmware was previously based on SDK 2.5.0 and 2.6.0, where this feature was not available.

What could be causing the issue ?
I understand that without confirmable, some messages might be lost. However, we are here talking of more than 50% messages lost, even in good network conditions.

On the image attached, I am expecting a continuous flow of 4 messages / hour (and a bit more during the night).


My first idea is that the modem gets disconnected before the message is fully sent. However, this is not consistent with the data consumed by the SIM card (at least, it's not obvious)

Is there any explanation to this ? Are there reported "bad network cells", that could explain this behavior ?

Thanks for your help.

  • Hi,

    Thanks for the clear description and the plot. I think the good RSRP and connection evaluation values alongside missing messages in nRF Cloud usually means we should look beyond the bad signal alone. You mentioned that you are using non-confirmable CoAP and there is no ACK from the server, so in this case the stack has no way to know if the message was received. A return value of 0 from nrf_cloud_coap_json_message_send only means the UDP packet left the device stack, not that nRF Cloud received or stored it. So could you try to switch to confirmable CoAP (confirmable=true) on the test device? and check the return values.

    Also, since the modem is powered off after each cycle, a full DTLS handshake and reauthentication with nRF Cloud is usually required on every reconnect. Could you also log the return value of nrf_cloud_coap_connect() each cycle to confirm it always succeeds before the send? 

    Can you provide a modem trace on an affected device during a failed send cycle? As this will help us to see whether the DTLS handshake, PDN attach, or UDP transmission is failing at the network level. Thanks

    Best Regards,
    Syed Maysum

  • Do you use DTLS 1.2 CID and save/restore the session for cfun=0/cfun=1?

    Then such pretty high loss rates are sometimes caused by a DNS-load-balancer.

    On the initial exchange a DTLS session is only established on one node. That's fine, as long as the ip-address is used.

    But it may fail, if a follow-up DNS lookup results in the ip-address of an other node.

  • Thanks for your response.

    Trying confirmable=true will off course be my next test.
    However, I'll have to check the impact on power consumption: I assume that the modem will be on for a longer period of time, to be able to receive the acknoledge.

    The full handshake is performed at every reconnection, and result of is always 0.

    I'll what I can do for modem trace. (I am not familliar with this), since the device is currently being used by the customer.

  • I am not sure to fully understand, but I think the answer is no.
    My device uses the nRF9160, which is not capable to pause/restore the session with nRFCloud.

  • OK, I guess then it's not the case.

    -------------------------------------------------------------------

    Just, if you curious: 

    nslookup nrfcloud.com

    Name:    nrfcloud.com
    Address: 3.160.150.112
    Name:    nrfcloud.com
    Address: 3.160.150.106
    Name:    nrfcloud.com
    Address: 3.160.150.87
    Name:    nrfcloud.com
    Address: 3.160.150.5

    Using that DNS address resolves in 4 different ip-addresses.

    If the device gets one of those 4 at the first time doing a DNS-lookup, it will establish a DTLS session with that. But if on restart a later DNS-lookup results in a different node, even DTLS CID will stop working. Therefore the device needs to stick on restart to the ip-address and must not do a DNS lookup again.

Related