DTLS causes re-registration on LwM2M using mobile network

Hello Everyone,

Summary

Chip:

nRF52840

OS:

nRF Connect / Zephyr

Problem:

Mobile network connections cause LwM2M (with DTLS) to perform re-registration if the update interval is longer than ~3 minutes. 

Details

We're using LwM2M (with DTLS) to monitor / control our nRF52840 uC (connected via an openthread network [OT]).

Working Condition

When the OT boarder router is connected via a fixed connection (within a building) we can set an LwM2M update interval of 5 minutes with no problems. Registration occurs once and updates occur after that point.

Error Condition:

When the OT boarder router is connected via a mobile connection (aka sim) we can't set an LwM2M update interval to more than ~2-3 minutes. If we do set a longer interval for LwM2M all update requests timeout.

This causes the device to perform re-registrations, which has the following effects:

  • Increase data usage
  • Dropping in and out of the LwM2M server as the connection interval is longer than the expected life time.

Additionally if i disable DTLS encryption then LwM2M may have longer update intervals. 

Assumption of the issue

I'm assuming the issue is that the mobile operators network is closing / deleting the NAT entry after 2-3 minutes of no use. Which means the LwM2M server cannot identify the client via the IP+port, forcing the device to re-register / negotiate the DTLS encryption.

From what i've read the following solutions are plausible:

  • Replace DTLS encryption in for OSCORE.
    • Zephyr doesn't seem to have support for OSCORE yet, there is a module for it but its not in the LwM2M stack at-least.
  • Using DTLS 1.2 on the device and server. This allows the connection to be identified by the connection id CID.
    • I'm not sure what version of DTLS Zephyr uses.
  • Sending empty requests every 2 minutes to keep the port open.

Any help or advise on this issue would be great.

Thanks for your time!

  • DTLS 1.2 is RFC6347, that is not generally including DTLS 1.2 CID (RFC 9146).

    Currently I know three DTLS 1.2 CID implementations Eclipse/Californium (Java, server/client), Eclipse/tinydtls (C, feature branch, client only), and mbedTLS (release begin of this year, C, client/server). AFAIK, zephyr uses mbedTLS, but I'm not sure, which version and if CID is enabled. I setup a demo with tinydtls  zephyr-coaps-client (nRF9160, coap only, not lwm2m), that works pretty well.

    Unfortunately it requires more then just using DTLS 1.2 CID, because some upper layer stuff uses the ip-address to identify the other peer as well and that must be adapted also.

  • Cheers for the above. NRF v2.3.0 seems to use mbedtls v3.1.0, which was released in Dec 17, 2021. However the release notes do state:

    The identifier of the CID TLS extension can be configured by defining MBEDTLS_TLS_EXT_CID at compile time.

    But it doesn't seem like Zephyr has enabled it yet:

    --

    Reading what you said does this mean CID may improve the situation but doesn't guarantee to fix it? 

    If this is true, then using OSCORE instead of DTLS would be the only real fix? 

  • > Reading what you said does this mean CID may improve the situation but doesn't guarantee to fix it? 

    Reading my answer should make you aware, that the upper layers may stick to use the ip-address as peer's endpoint identifier, that's how CoAP RFC 7252 RFC 7252 - 4.1. Messages and Endpoints defines it. The implementations of the above layer may have already chosen something else (e.g. for DTLS the session id or the principal), but that depends on that implementation.

    > If this is true, then using OSCORE instead of DTLS would be the only real fix?

    As long as OSCORE is running over CoAP, that same endpoint definition is used and you will run into the very same issues. If changing ip-endpoints are not considered in the implementations of the above layers, it will not work.

    Very simple: DTLS 1.2 CID and OSORE will relax the encryption from the ip-address/endpoint, but not the processing above. If the LwM2M server didn't adapt that, it doesn't work.

    > The identifier of the CID TLS extension can be configured by defining MBEDTLS_TLS_EXT_CID at compile time.

    That mbedtls version is not compliant to the final version of RFC 9146. If you don't implement/run your own lwm2m server, this would depend on that server, of the deprecated variant is also supported.

  • Had the weekend to let your comments sink in. We're using a commercial SaaS LwM2M server, so i'm unsure on the exact implementation.

    But the problem could be to do with the LwM2M server. If the server identifies the clients using IP/port then this issue is apparent. 

    However if the server uses say, the DTLS session id to identify the client, then this could get around the issue of changing ip/port. In which case DTLS can aid in the issue of identifying devices when the ip/port changes.

    So even if Zephyr did support DTLS CIDs, the DTLS implementation would have to take advantage of it.

    If DTLS provides a session id what gain do we get from CIDs?

    How far am i off the mark?

  • > We're using a commercial SaaS LwM2M server,

    I guess, they will know, how it works.

    The DTLS session id is only used by the client in its "ClientHello" during a handshake. Without DTLS 1.2 CID you will the need frequently resumption handshakes. That was more or less the situation 5 years ago. At that time I introduced something as the "auto resumption timeout" into Californium. Anyway, even a resumption handshake is a handshake and is therefore more overhead. And you need to ensure, that both sides supports it. It's more common, that the server, which are aware of that ip-address change, use the dtls-principal. That works even if the resumption handshake falls back to a full handshake. And it works also for DTLS 1.2 CID.

    In difference to the DTLS session id, the DTLS 1.2 CID is send in every encrypted message,  therefore it works instantly and doesn't  require new (resumption) handshakes.

Related