nRF54L15 DK: Zephyr POSIX UDP socket over OpenThread/NAT64 sends CoAP successfully, but poll()/recv() does not receive returned packet

Hello Nordic team,

I am testing a plain CoAP telemetry upload from an nRF54L15 DK over Thread using a Raspberry Pi OTBR and NAT64 to ThingsBoard Cloud.

Hardware / software:

  • Board: nRF54L15 DK

  • nRF Connect SDK: v3.2.4-4c3fc0d44534

  • Zephyr: v4.2.99-9673eec75908

  • OpenThread with a Raspberry Pi OTBR

  • Destination: ThingsBoard Cloud CoAP endpoint

  • NAT64 IPv6 address used by the device: fd57:cacb:e8cf:2::343a:6f12

  • Plain CoAP port: 5683

The application uses Zephyr/POSIX UDP sockets together with Zephyr’s CoAP packet builder.

The relevant socket flow is:

sock = socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDP);

memset(&server_addr, 0, sizeof(server_addr));
server_addr.sin6_family = AF_INET6;
server_addr.sin6_port = htons(5683);
server_addr.sin6_scope_id = 0U;
inet_pton(AF_INET6, "fd57:cacb:e8cf:2::343a:6f12", &server_addr.sin6_addr);

connect(sock, (struct sockaddr *)&server_addr, sizeof(server_addr));

send(sock, request.data, request.offset, 0);

poll(..., 5000);
recv(sock, response_buf, sizeof(response_buf), MSG_DONTWAIT);

The CoAP message is a Confirmable POST to:

/api/v1/<ThingsBoard device access token>/telemetry

with JSON payload, for example:

{"temperature":25,"counter":0,"source":"zephyr-coap-ack"}

The generated CoAP packet length is 113 bytes.

Sending works reliably. ThingsBoard receives the telemetry. The OTBR tcpdump shows that ThingsBoard sends a 12-byte CoAP response back to the same UDP source port.

Example OTBR tcpdump:

wpan0 In  IP 192.168.255.3.48998 > 52.58.111.18.5683: UDP, length 113
eth0  Out IP 192.168.178.43.48998 > 52.58.111.18.5683: UDP, length 113
eth0  In  IP 52.58.111.18.5683 > 192.168.178.43.48998: UDP, length 12
wpan0 Out IP 52.58.111.18.5683 > 192.168.255.3.48998: UDP, length 12

The response is then retransmitted by ThingsBoard, which is expected because the device does not ACK the separate Confirmable CoAP response:

eth0  In  IP 52.58.111.18.5683 > 192.168.178.43.48998: UDP, length 12
wpan0 Out IP 52.58.111.18.5683 > 192.168.255.3.48998: UDP, length 12

With tcpdump -X, the CoAP response was decoded as:

48 41 ... <8 byte token>

So this is a separate Confirmable 2.01 Created response.

I also tested a Non-confirmable POST. In that case ThingsBoard responded with:

58 41 ... <8 byte token>

So ThingsBoard also sends a Non-confirmable 2.01 Created response for a NON request. This confirms that the issue is not only related to missing ACK handling.

The problem:

  • The outgoing packet works.

  • ThingsBoard receives the telemetry.

  • ThingsBoard replies.

  • The response is visible on the OTBR.

  • The response is sent back to the same UDP source port.

  • But on the nRF54L15 DK, poll() times out and recv() does not receive the response.

Additional tests already tried:

  • connect() + send() + poll() + recv()

  • sendto() without connect(), with receive path unchanged

  • explicit bind() to in6addr_any and port 0

  • poll() with timeout and with infinite timeout

  • recv(MSG_DONTWAIT) after poll()

  • logging reduced and switched from printf() to Zephyr LOG to avoid UART timing effects

The result remains that the response is visible on the OTBR but not delivered to the application socket.

As additional context: I already tested the native OpenThread CoAP API for plain CoAP, and sending telemetry with it worked. So using OpenThread’s native CoAP path may be a viable option for the unencrypted case.

However, for the encrypted case I would like to use CoAPS with X.509 client certificates against ThingsBoard Cloud. With OpenThread’s CoAP Secure API I was not able to establish the X.509/DTLS connection. otCoapSecureConnect() failed locally before any DTLS packet was visible on the OTBR. PSK-based CoAP Secure worked, but X.509 did not.

The client certificate and private key worked from a CoAPS client on the Raspberry Pi against ThingsBoard Cloud. My current suspicion is that the ThingsBoard Cloud server certificate chain or selected cipher suite may require RSA-based authentication or a configuration that is not supported by the OpenThread CoAP Secure setup I am using. My OpenThread/NCS build has ECDHE-ECDSA enabled, and ECJPAKE could not be disabled because the prebuilt Nordic OpenThread library requires it.

Questions:

  1. Is using Zephyr/POSIX UDP sockets over OpenThread/NAT64 expected to work for this use case on nRF54L15 DK / NCS v3.2.4?

  2. Is the OTBR tcpdump view with IPv4-like addresses on wpan0 expected in this NAT64 setup, and should the end device still receive the packet through an AF_INET6 UDP socket connected to the synthetic NAT64 IPv6 address?

  3. Are there known limitations or required Kconfig options for receiving UDP responses through POSIX sockets over OpenThread/NAT64?

  4. Would you recommend using native OpenThread UDP/CoAP APIs instead of Zephyr/POSIX sockets for this use case?

  5. If native OpenThread CoAP is the recommended path: what would be the recommended approach for CoAPS with X.509 client certificates against a public cloud endpoint such as ThingsBoard Cloud?

  6. Is OpenThread CoAP Secure with X.509 expected to work against such a cloud endpoint, provided the certificates and Kconfig options are correct?

  7. What would be the best way to verify whether the returned packet reaches the OpenThread IPv6/UDP layer on the device but is not delivered to the Zephyr socket?

Relevant prj.conf options include:

CONFIG_POSIX_API=y
CONFIG_ZVFS_POLL_MAX=4
CONFIG_LOG=y
CONFIG_LOG_MODE_DEFERRED=y
CONFIG_LOG_DEFAULT_LEVEL=3
CONFIG_OPENTHREAD_THREAD_STACK_SIZE=8192

I can provide the complete minimal source file, prj.conf, UART logs, and OTBR tcpdump logs. The ThingsBoard device access token and credentials are replaced by placeholders.

Best regards,
Markus

CoAP_Zephyr_API_Nordic.zip

Parents
  • My current suspicion is that the ThingsBoard Cloud server certificate chain or selected cipher suite may require RSA-based authentication or a configuration that is not supported by the OpenThread CoAP Secure setup I am using.

    In the very most cases a IP capture of the handshake helps to narrow down the incompatibility and may help to overcome that. So just in the case a IP capture is possible for you to post here, I will have a look on it.

  • Hi,

    thank you for your reply.

    I have done some more tests and captured the DTLS handshake. The situation is now a bit clearer.

    I tested two different approaches:

    1. Zephyr secure sockets with DTLS over OpenThread/NAT64

    2. Native OpenThread CoAP Secure using otCoapSecureConnect()

    For the Zephyr secure socket approach, the handshake gets quite far. The client sends the ClientHello, the server responds with HelloVerifyRequest, the client sends the second ClientHello with the cookie, and the server then sends ServerHello, Certificate, ServerKeyExchange, CertificateRequest and ServerHelloDone.

    After that, the client does not send the next handshake flight and the server retransmits its flight. The connection finally fails on the client side.

    In this case, the server selected:

    TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

    So the server side does support and select a cipher suite that is available in the Zephyr/mbedTLS configuration.

    With the native OpenThread CoAP Secure approach, the behavior is different. The DTLS HelloVerifyRequest exchange works, so NAT64 and basic DTLS packet exchange are working. However, the OpenThread CoAP Secure ClientHello only offers:

    TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8

    The ThingsBoard Cloud server then immediately responds with a fatal DTLS alert:

    handshake_failure

    So at least for the native OpenThread CoAP Secure API, the problem seems to be a cipher suite mismatch. The OpenThread CoAPS implementation in my current nRF Connect SDK / OpenThread setup appears to offer only AES_128_CCM_8, while the ThingsBoard Cloud endpoint does not seem to accept it.

    I also tested the native OpenThread CoAPS setup with client certificate enabled and server certificate verification disabled. In that configuration, otCoapSecureConnect() starts successfully, so the local client certificate/key setup seems to be accepted. The failure then happens during the DTLS handshake with the server.

    For reference, the ThingsBoard Cloud endpoint I am testing against is:

    coap.eu.thingsboard.cloud:5684

    and the resolved IPv4 address in my capture was:

    52.58.111.18

    The NAT64 address used by the Thread device was:

    fd2a:4782:c31a:2::343a:6f12

    From my current understanding, there seem to be two separate issues:

    • Zephyr secure sockets: the server accepts a suitable GCM cipher suite and sends the full server flight, but the client does not continue after ServerHelloDone.

    • Native OpenThread CoAP Secure: the client only offers TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8, and the server rejects it with fatal handshake_failure.

    Is it expected that native OpenThread CoAP Secure in nRF Connect SDK only offers TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8?

    Is there a supported way in nRF Connect SDK / OpenThread to enable additional DTLS 1.2 cipher suites for otCoapSecureConnect(), for example TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 or TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256?

    At this point I am stuck. I currently do not have a working path for secure X.509 CoAPS communication with ThingsBoard Cloud.

    The Zephyr secure socket approach was also quite tricky to configure together with OpenThread. In my setup, enabling OpenThread caused MBEDTLS_SSL_TLS_C and MBEDTLS_SSL_CLI_C not to be enabled automatically, so I had to override this in Kconfig to get the Zephyr secure socket DTLS client to build.

    Regards,
    Markus

  • Thanks, your information helped a lot.

    I went through the documentation again and also did some more tests. The topic turned out to be more complex than I initially thought, and I agree that there is more involved than just the cipher suite.

    I did some additional OpenSSL tests against the same endpoint.

    This does not work for me:

    openssl s_client -dtls1_2 \
      -connect coap.eu.thingsboard.cloud:5684 \
      -cipher 'ECDHE-ECDSA-AES128-CCM8:@SECLEVEL=0' \
      -curves X25519:secp384r1 \
      -state -msg
    

    But this does work:

    openssl s_client -dtls1_2 \
      -connect coap.eu.thingsboard.cloud:5684 \
      -cipher 'ECDHE-ECDSA-AES128-CCM8:@SECLEVEL=0' \
      -curves prime256v1:secp384r1 \
      -state -msg
    

    So my previous conclusion that ThingsBoard Cloud simply does not accept TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 was not correct.

    It looks more subtle: the server does accept TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8, but in my successful OpenSSL test secp384r1 had to be present in supported_groups. Interestingly, the actual ECDHE key exchange later used prime256v1.

    So at least for coap.eu.thingsboard.cloud:5684, the interoperability problem seems to depend not only on the offered cipher suite, but also on the advertised supported groups / elliptic curves.

    This may also explain why the native OpenThread CoAP Secure case fails even though it offers TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8: if its ClientHello does not advertise a compatible set of supported groups, the ThingsBoard Cloud endpoint may still reject it with handshake_failure.

    For the Zephyr secure socket approach, I was able to get the ClientHello closer to the working OpenSSL case by restricting the cipher suite to TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 and making sure both prime256v1 and secp384r1 are present in supported_groups. In that case, ThingsBoard accepts the ClientHello and sends the full server flight. The remaining problem then seems to be on the embedded client side: it does not continue after ServerHelloDone.

  • As you see, there are more details, and there will be still more details, if a ip-capture would be provided.

  • > Interestingly, the actual ECDHE key exchange later used prime256v1.

    The server certificates uses SHA256withECDSA, SHA384withECDSA and SHA384withRSA.

    If I remember it well, SHA384withECDSA uses then the secp384r1.

    And RFC8422 then defines:

    NOTE: A server participating in an ECDHE_ECDSA key exchange may use
       different curves for the ECDSA or EdDSA key in its certificate and
       for the ephemeral ECDH key in the ServerKeyExchange message.  The
       server MUST consider the extensions in both cases.

    So the curves must match both, ECDSA and ECDHE.

    > The remaining problem then seems to be on the embedded client side: it does not continue after ServerHelloDone.

    The server's certificate chain uses 3 certificates and has a size of 3196 bytes.

    It uses a wildcard in the CN, "*.eu.thingsboard.cloud". 

    Therefore not sure, what makes the client reject the certificate. Maybe the overall size, maybe the wildcard, if the client complies with RFC7252 about the used x509 certificates, 

       ...If there
       is no SubjectAltName in the certificate, then the authority of the
       request URI MUST match the Common Name (CN) found in the certificate
       using the matching rules defined in [RFC3280] with the exception that
       certificates with wildcards are not allowed.

      

  • Thanks a lot, that helps. The RFC8422 point makes sense and matches my OpenSSL tests. It explains why the client needs to advertise both curves: one for the ECDSA certificate/key and one for the ephemeral ECDHE key exchange.

    In my Zephyr secure socket test the ClientHello now contains both secp256r1 and secp384r1, and the ThingsBoard server accepts it and sends the full server flight. So the supported_groups issue seems solved for that path.

    The remaining problem is probably later, while processing the server flight. Your note about the certificate chain is interesting: 3 certificates, about 3196 bytes, wildcard CN, and the server also sends CertificateRequest. Even with peer verification disabled, this is a rather complex DTLS server flight for an embedded client.

    So I think I will stop debugging directly against ThingsBoard for the moment and set up a controlled local CoAPS test server first, with a short self-signed EC certificate, no wildcard, no client certificate request, and the same AES128-CCM8 cipher suite. If that works, then the remaining issue is likely specific to the ThingsBoard certificate flight / DTLS profile rather than the basic Zephyr/OpenThread DTLS transport.

  • > server also sends CertificateRequest ... Even with peer verification disabled

    According the docu page from ThingsBoard, it's possible to do both anonymous or client certificate authentication. In order to support both from the servers side, the server uses a CertificateRequest and the client may then either send a client Certificate or a empty Certificate.

    In my test I used the anonymous approach on the client side and though Eclipse/Californium follows RFC7252, I also disabled the certificate verification of the CN. 

    I guess also in your case "peer verification disabled" is set on your client's side. Then you may also need to check, if that means, the client doesn't verify the server certificate or that the client sends an empty certificate. 

    > set up a controlled local CoAPS test server first

    That's always a good idea ;-).

Reply
  • > server also sends CertificateRequest ... Even with peer verification disabled

    According the docu page from ThingsBoard, it's possible to do both anonymous or client certificate authentication. In order to support both from the servers side, the server uses a CertificateRequest and the client may then either send a client Certificate or a empty Certificate.

    In my test I used the anonymous approach on the client side and though Eclipse/Californium follows RFC7252, I also disabled the certificate verification of the CN. 

    I guess also in your case "peer verification disabled" is set on your client's side. Then you may also need to check, if that means, the client doesn't verify the server certificate or that the client sends an empty certificate. 

    > set up a controlled local CoAPS test server first

    That's always a good idea ;-).

Children
No Data
Related