DTLS minimum handshake timeout

Hi

It seems that the default minium DTLS handshake timeout is about 1 second. That means, there is a retransmission if a response isn't received within this timeout. In NB-IoT/LTE-M this leads to many unnecessary retransmission:

The server is configured to a higher timeout, but how can I configure the nRF9151 to be more patient during handshake? Since NB-IoT has latencies up to 10 seconds, retransmitting every second does barely make sense.

I have tried a lot of socket options so far, but none seem to work. Zephyr itself offers TLS_DTLS_HANDSHAKE_TIMEOUT_MIN, but this one is not supported on offloaded sockets. Am I missing something? It would be a pitty if I have to switch to mbedTLS, because of a missing handshake timeout.

NCS version: v3.0.2

Best regards
Samuel

Parents
  • Hi Samuel, 

    Could you please provide more information on the DTLS solution that you are using? You are not using mbedTLS? What modem FW are you running on your device?

    Looking through our documentation, there should be a socket option called SO_KEEPOPEN that will keep DTLS CID enabled sessions during network outages, i.e. from nrfxlib/nrf_modem/include/nrf_socket.h: Keep socket open when its PDN connection is lost.

    Does this provide any solution to your issue?

    Kind regards,
    Øyvind

  • Hi

    I am using mfw_nrf91x1_2.0.2 modem firmware version and a modifed coap sample with CONFIG_NET_SOCKETS_OFFLOAD=y.

    In my understanding, mbedTLS is linked and used with the application, but works in conjuntion with the modem. I don't use the mbedTLS implementation provided by zephyr, which is fully configurable, but does not use any hardware acceleration.

    To clarify the problem: There is no problem with CID or established connections at all. But when I open the socket and connect to the server, the device is not patient and retransmit messages every second:

    The handshake starts with:

    Client Hello

    Hello Verify Request

    Client Hello

    So far so good. After 1.5 seconds the server reponds with Server Hello. Meanwhile the Client suggested that the packet was lost and has retransmitted the Client Hello message. That's why there you see two Sever Hello. And that's also why there are three Certificate, Client Key Exchange... and three Change Cipher Spec... messages. The reason is, that the client retransmits messages after 1 second of timeout in the DTLS handshake.

    In mbedTLS this can be configured using the mbedtls_ssl_conf_handshake_timeout function. In Zephyr you can use setsockopt(sock, SOL_TLS, TLS_DTLS_HANDSHAKE_TIMEOUT_MIN, min_hs_timeout_ms). But I don't find any option for offloaded sockets. I could use zephyr's mbedTLS with CONFIG_MBEDTLS_BUILTIN and do DLTS in software, but it probably will be much slower.

    We will use CID in the end and there will be only few handshakes, but: Retransmitting delayed (but successful transmitted) packets lead to even more packets and roundtrips in the handshake. On one hand this consumes more energy, on the other hand it increases the risk of handshake failure in bad cellular coverage. Therefore there should be an option for the minimum retransmission timeout like mbedTLS and Californium offer. 

    Best regards
    Samuel

  • "work week 32"

     

    Just in the case you're interested in testing, if other timings changes the behavior relevant, you may do some tests with my client in the meantime. 

  •   thanks for your patience. Our team provided the following answer:

    NRF_SO_SEC_DTLS_HANDSHAKE_TIMEO is a configurable timeout for the whole handshake. 

    For example if DTLS total timeout is 15 s then timeouts are 1 s, 2 s, 4 s, and 8 s.

     

    We are also aware that 1s timeout for re-transmission in DTLS handshake is not optimal and leads to re-transmissions. There just never hasn't been time to optimize it. Also the RFC doesn't take into account low bandwidth technologies such as NB-IoT. 

    Kind regards,
    Øyvind

  • > We are also aware that 1s timeout for re-transmission in DTLS handshake is not optimal and leads to re-transmissions.

    Yes.

    > There just never hasn't been time to optimize it.

    The configuration parameters are already available in mbedTLS, so the work to do is mainly to add them to the socket option API.

    > Also the RFC doesn't take into account low bandwidth technologies such as NB-IoT.

    The RFC is from 2012, AFAIK, NB-IoT was in 2012 just not available.

    Even more, the IETF tls working group focus mainly on "web-usage" and that causes then recommendations as this 1s. The RFC refers to considering congestion and concludes, that for the web-usage, 1s will be better suited than 3s.

    Under normal signal conditions, and maybe using PSK and not x509, that may also apply for modem communication. Means, it causes some retransmission, but doesn't harm more. But with low signals (that's the domain of NB-IoT), that may change and cause the handshake even to fail, especially with the more data of x509.

    So all in all, maybe someone from your modem/dtls team has a look at that and is able to estimate, how much that additional socket option will cost. And then we will see, if that gets implemented once upon a day.

  • I also think that the effort isn't too high. You already have a hard coded 1s minimum timeout that must be changed. A further reason to change this is that you can align to zephyr's socket API. Right now, there is the nrf socket option
    NRF_SO_SEC_DTLS_HANDSHAKE_TIMEO 

    and zephyr's socket options

    TLS_DTLS_HANDSHAKE_TIMEOUT_MIN
    - TLS_DTLS_HANDSHAKE_TIMEOUT_MAX

    which is pretty annoying, when the same code base is used with different hardware architectures.

  • With the satellite ambitions this topic will get important as well. Unless Nordic's plan is not to support DTLS for NTN.

Reply Children
  • The secure socket (TLS/DTLS) API is part of the LTE modem. This handles re-transmissions inside the modem. From our documentation:

    What is TLS_DTLS_HANDSHAKE_TIMEOUT_MIN compared to NRF_SO_SEC_DTLS_HANDSHAKE_TIMEO?
    • TLS_DTLS_HANDSHAKE_TIMEOUT_MIN is a Zephyr generic TLS socket option that sets the minimum DTLS handshake retransmission timeout (in milliseconds). It works with TLS_DTLS_HANDSHAKE_TIMEOUT_MAX; the handshake timeout starts at min and doubles on each retry until max is reached (TLS_DTLS_HANDSHAKE_TIMEOUT_MIN).
    • NRF_SO_SEC_DTLS_HANDSHAKE_TIMEO is an nRF modem–specific socket option (NRF_SOL_SECURE) that sets the total DTLS handshake timeout (including retransmissions) using fixed, allowed values in seconds: 0, 1, 3, 7, 15, 31, 63, 123 (Zephyr alias: TLS_DTLS_HANDSHAKE_TIMEO) (Socket optionsTLS_DTLS_HANDSHAKE_TIMEO).
    • TLS_DTLS_HANDSHAKE_TIMEOUT_MAX is the Zephyr generic TLS socket option that sets the maximum DTLS handshake retransmission timeout (in milliseconds). Together with TLS_DTLS_HANDSHAKE_TIMEOUT_MIN, the timeout starts at MIN and doubles on each retransmission until MAX is reached (time unit: ms) (TLS_DTLS_HANDSHAKE_TIMEOUT_MAX; TLS_DTLS_HANDSHAKE_TIMEOUT_MIN).
      This contrasts with the nRF modem–specific NRF_SO_SEC_DTLS_HANDSHAKE_TIMEO (Zephyr alias TLS_DTLS_HANDSHAKE_TIMEO), which sets a single total handshake timeout in seconds with fixed allowed values (0, 1, 3, 7, 15, 31, 63, 123) (socket_ncs.h TLS_DTLS_HANDSHAKE_TIMEO; Modem socket options).
    In short: TLS_DTLS_HANDSHAKE_TIMEOUT_MIN/MAX control per-retry backoff (ms) in Zephyr’s generic TLS; NRF_SO_SEC_DTLS_HANDSHAKE_TIMEO sets a single total handshake timeout (s) for nRF modem sockets with predefined values.
    The nRF Connect SDK Networking Sockets, designed for nRF91-series, is part of the Modem Library

    The values in https://docs.nordicsemi.com/bundle/nrfxlib-apis-latest/page/group_nrf_socket_so_sec_handshake_timeo… are valid configuration to socket option NRF_SO_SEC_DTLS_HANDSHAKE_TIMEO. This means timeout value for complete DTLS handshake.

    Supported DTLS total handshake timeouts are 1 s, 3 s, 7 s, 15 s, 31 s, 63 s and 123. DTLS handshake timeout is a requirement from Verizon Network.

    In LTE-M:

    If application configures maximum DTLS handshake timeout to 15 sec, MFW uses DTLS re-transmission timeouts 1 s, 2 s, 4 s, 8 s = 15 s

    In NB-Iot:

    If application configures maximum DTLS handshake timeout to 63 sec, MFW uses DTLS re-transmission timeouts 4 s, 8 s, 16 s, 32 s = 60 s


    Let me know if anything is unclear based on this.
Related