nRF54L15 DK: Zephyr POSIX UDP socket over OpenThread/NAT64 sends CoAP successfully, but poll()/recv() does not receive returned packet

Hello Nordic team,

I am testing a plain CoAP telemetry upload from an nRF54L15 DK over Thread using a Raspberry Pi OTBR and NAT64 to ThingsBoard Cloud.

Hardware / software:

  • Board: nRF54L15 DK

  • nRF Connect SDK: v3.2.4-4c3fc0d44534

  • Zephyr: v4.2.99-9673eec75908

  • OpenThread with a Raspberry Pi OTBR

  • Destination: ThingsBoard Cloud CoAP endpoint

  • NAT64 IPv6 address used by the device: fd57:cacb:e8cf:2::343a:6f12

  • Plain CoAP port: 5683

The application uses Zephyr/POSIX UDP sockets together with Zephyr’s CoAP packet builder.

The relevant socket flow is:

sock = socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDP);

memset(&server_addr, 0, sizeof(server_addr));
server_addr.sin6_family = AF_INET6;
server_addr.sin6_port = htons(5683);
server_addr.sin6_scope_id = 0U;
inet_pton(AF_INET6, "fd57:cacb:e8cf:2::343a:6f12", &server_addr.sin6_addr);

connect(sock, (struct sockaddr *)&server_addr, sizeof(server_addr));

send(sock, request.data, request.offset, 0);

poll(..., 5000);
recv(sock, response_buf, sizeof(response_buf), MSG_DONTWAIT);

The CoAP message is a Confirmable POST to:

/api/v1/<ThingsBoard device access token>/telemetry

with JSON payload, for example:

{"temperature":25,"counter":0,"source":"zephyr-coap-ack"}

The generated CoAP packet length is 113 bytes.

Sending works reliably. ThingsBoard receives the telemetry. The OTBR tcpdump shows that ThingsBoard sends a 12-byte CoAP response back to the same UDP source port.

Example OTBR tcpdump:

wpan0 In  IP 192.168.255.3.48998 > 52.58.111.18.5683: UDP, length 113
eth0  Out IP 192.168.178.43.48998 > 52.58.111.18.5683: UDP, length 113
eth0  In  IP 52.58.111.18.5683 > 192.168.178.43.48998: UDP, length 12
wpan0 Out IP 52.58.111.18.5683 > 192.168.255.3.48998: UDP, length 12

The response is then retransmitted by ThingsBoard, which is expected because the device does not ACK the separate Confirmable CoAP response:

eth0  In  IP 52.58.111.18.5683 > 192.168.178.43.48998: UDP, length 12
wpan0 Out IP 52.58.111.18.5683 > 192.168.255.3.48998: UDP, length 12

With tcpdump -X, the CoAP response was decoded as:

48 41 ... <8 byte token>

So this is a separate Confirmable 2.01 Created response.

I also tested a Non-confirmable POST. In that case ThingsBoard responded with:

58 41 ... <8 byte token>

So ThingsBoard also sends a Non-confirmable 2.01 Created response for a NON request. This confirms that the issue is not only related to missing ACK handling.

The problem:

  • The outgoing packet works.

  • ThingsBoard receives the telemetry.

  • ThingsBoard replies.

  • The response is visible on the OTBR.

  • The response is sent back to the same UDP source port.

  • But on the nRF54L15 DK, poll() times out and recv() does not receive the response.

Additional tests already tried:

  • connect() + send() + poll() + recv()

  • sendto() without connect(), with receive path unchanged

  • explicit bind() to in6addr_any and port 0

  • poll() with timeout and with infinite timeout

  • recv(MSG_DONTWAIT) after poll()

  • logging reduced and switched from printf() to Zephyr LOG to avoid UART timing effects

The result remains that the response is visible on the OTBR but not delivered to the application socket.

As additional context: I already tested the native OpenThread CoAP API for plain CoAP, and sending telemetry with it worked. So using OpenThread’s native CoAP path may be a viable option for the unencrypted case.

However, for the encrypted case I would like to use CoAPS with X.509 client certificates against ThingsBoard Cloud. With OpenThread’s CoAP Secure API I was not able to establish the X.509/DTLS connection. otCoapSecureConnect() failed locally before any DTLS packet was visible on the OTBR. PSK-based CoAP Secure worked, but X.509 did not.

The client certificate and private key worked from a CoAPS client on the Raspberry Pi against ThingsBoard Cloud. My current suspicion is that the ThingsBoard Cloud server certificate chain or selected cipher suite may require RSA-based authentication or a configuration that is not supported by the OpenThread CoAP Secure setup I am using. My OpenThread/NCS build has ECDHE-ECDSA enabled, and ECJPAKE could not be disabled because the prebuilt Nordic OpenThread library requires it.

Questions:

  1. Is using Zephyr/POSIX UDP sockets over OpenThread/NAT64 expected to work for this use case on nRF54L15 DK / NCS v3.2.4?

  2. Is the OTBR tcpdump view with IPv4-like addresses on wpan0 expected in this NAT64 setup, and should the end device still receive the packet through an AF_INET6 UDP socket connected to the synthetic NAT64 IPv6 address?

  3. Are there known limitations or required Kconfig options for receiving UDP responses through POSIX sockets over OpenThread/NAT64?

  4. Would you recommend using native OpenThread UDP/CoAP APIs instead of Zephyr/POSIX sockets for this use case?

  5. If native OpenThread CoAP is the recommended path: what would be the recommended approach for CoAPS with X.509 client certificates against a public cloud endpoint such as ThingsBoard Cloud?

  6. Is OpenThread CoAP Secure with X.509 expected to work against such a cloud endpoint, provided the certificates and Kconfig options are correct?

  7. What would be the best way to verify whether the returned packet reaches the OpenThread IPv6/UDP layer on the device but is not delivered to the Zephyr socket?

Relevant prj.conf options include:

CONFIG_POSIX_API=y
CONFIG_ZVFS_POLL_MAX=4
CONFIG_LOG=y
CONFIG_LOG_MODE_DEFERRED=y
CONFIG_LOG_DEFAULT_LEVEL=3
CONFIG_OPENTHREAD_THREAD_STACK_SIZE=8192

I can provide the complete minimal source file, prj.conf, UART logs, and OTBR tcpdump logs. The ThingsBoard device access token and credentials are replaced by placeholders.

Best regards,
Markus

CoAP_Zephyr_API_Nordic.zip

Parents
  • My current suspicion is that the ThingsBoard Cloud server certificate chain or selected cipher suite may require RSA-based authentication or a configuration that is not supported by the OpenThread CoAP Secure setup I am using.

    In the very most cases a IP capture of the handshake helps to narrow down the incompatibility and may help to overcome that. So just in the case a IP capture is possible for you to post here, I will have a look on it.

  • Hi,

    thank you for your reply.

    I have done some more tests and captured the DTLS handshake. The situation is now a bit clearer.

    I tested two different approaches:

    1. Zephyr secure sockets with DTLS over OpenThread/NAT64

    2. Native OpenThread CoAP Secure using otCoapSecureConnect()

    For the Zephyr secure socket approach, the handshake gets quite far. The client sends the ClientHello, the server responds with HelloVerifyRequest, the client sends the second ClientHello with the cookie, and the server then sends ServerHello, Certificate, ServerKeyExchange, CertificateRequest and ServerHelloDone.

    After that, the client does not send the next handshake flight and the server retransmits its flight. The connection finally fails on the client side.

    In this case, the server selected:

    TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

    So the server side does support and select a cipher suite that is available in the Zephyr/mbedTLS configuration.

    With the native OpenThread CoAP Secure approach, the behavior is different. The DTLS HelloVerifyRequest exchange works, so NAT64 and basic DTLS packet exchange are working. However, the OpenThread CoAP Secure ClientHello only offers:

    TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8

    The ThingsBoard Cloud server then immediately responds with a fatal DTLS alert:

    handshake_failure

    So at least for the native OpenThread CoAP Secure API, the problem seems to be a cipher suite mismatch. The OpenThread CoAPS implementation in my current nRF Connect SDK / OpenThread setup appears to offer only AES_128_CCM_8, while the ThingsBoard Cloud endpoint does not seem to accept it.

    I also tested the native OpenThread CoAPS setup with client certificate enabled and server certificate verification disabled. In that configuration, otCoapSecureConnect() starts successfully, so the local client certificate/key setup seems to be accepted. The failure then happens during the DTLS handshake with the server.

    For reference, the ThingsBoard Cloud endpoint I am testing against is:

    coap.eu.thingsboard.cloud:5684

    and the resolved IPv4 address in my capture was:

    52.58.111.18

    The NAT64 address used by the Thread device was:

    fd2a:4782:c31a:2::343a:6f12

    From my current understanding, there seem to be two separate issues:

    • Zephyr secure sockets: the server accepts a suitable GCM cipher suite and sends the full server flight, but the client does not continue after ServerHelloDone.

    • Native OpenThread CoAP Secure: the client only offers TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8, and the server rejects it with fatal handshake_failure.

    Is it expected that native OpenThread CoAP Secure in nRF Connect SDK only offers TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8?

    Is there a supported way in nRF Connect SDK / OpenThread to enable additional DTLS 1.2 cipher suites for otCoapSecureConnect(), for example TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 or TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256?

    At this point I am stuck. I currently do not have a working path for secure X.509 CoAPS communication with ThingsBoard Cloud.

    The Zephyr secure socket approach was also quite tricky to configure together with OpenThread. In my setup, enabling OpenThread caused MBEDTLS_SSL_TLS_C and MBEDTLS_SSL_CLI_C not to be enabled automatically, so I had to override this in Kconfig to get the Zephyr secure socket DTLS client to build.

    Regards,
    Markus

  • As you see, there are more details, and there will be still more details, if a ip-capture would be provided.

  • > Interestingly, the actual ECDHE key exchange later used prime256v1.

    The server certificates uses SHA256withECDSA, SHA384withECDSA and SHA384withRSA.

    If I remember it well, SHA384withECDSA uses then the secp384r1.

    And RFC8422 then defines:

    NOTE: A server participating in an ECDHE_ECDSA key exchange may use
       different curves for the ECDSA or EdDSA key in its certificate and
       for the ephemeral ECDH key in the ServerKeyExchange message.  The
       server MUST consider the extensions in both cases.

    So the curves must match both, ECDSA and ECDHE.

    > The remaining problem then seems to be on the embedded client side: it does not continue after ServerHelloDone.

    The server's certificate chain uses 3 certificates and has a size of 3196 bytes.

    It uses a wildcard in the CN, "*.eu.thingsboard.cloud". 

    Therefore not sure, what makes the client reject the certificate. Maybe the overall size, maybe the wildcard, if the client complies with RFC7252 about the used x509 certificates, 

       ...If there
       is no SubjectAltName in the certificate, then the authority of the
       request URI MUST match the Common Name (CN) found in the certificate
       using the matching rules defined in [RFC3280] with the exception that
       certificates with wildcards are not allowed.

      

  • Thanks a lot, that helps. The RFC8422 point makes sense and matches my OpenSSL tests. It explains why the client needs to advertise both curves: one for the ECDSA certificate/key and one for the ephemeral ECDHE key exchange.

    In my Zephyr secure socket test the ClientHello now contains both secp256r1 and secp384r1, and the ThingsBoard server accepts it and sends the full server flight. So the supported_groups issue seems solved for that path.

    The remaining problem is probably later, while processing the server flight. Your note about the certificate chain is interesting: 3 certificates, about 3196 bytes, wildcard CN, and the server also sends CertificateRequest. Even with peer verification disabled, this is a rather complex DTLS server flight for an embedded client.

    So I think I will stop debugging directly against ThingsBoard for the moment and set up a controlled local CoAPS test server first, with a short self-signed EC certificate, no wildcard, no client certificate request, and the same AES128-CCM8 cipher suite. If that works, then the remaining issue is likely specific to the ThingsBoard certificate flight / DTLS profile rather than the basic Zephyr/OpenThread DTLS transport.

  • > server also sends CertificateRequest ... Even with peer verification disabled

    According the docu page from ThingsBoard, it's possible to do both anonymous or client certificate authentication. In order to support both from the servers side, the server uses a CertificateRequest and the client may then either send a client Certificate or a empty Certificate.

    In my test I used the anonymous approach on the client side and though Eclipse/Californium follows RFC7252, I also disabled the certificate verification of the CN. 

    I guess also in your case "peer verification disabled" is set on your client's side. Then you may also need to check, if that means, the client doesn't verify the server certificate or that the client sends an empty certificate. 

    > set up a controlled local CoAPS test server first

    That's always a good idea ;-).

  • Hi Achim,

    thanks again for your input. I now have a cleaner capture for the Zephyr secure socket / DTLS case against ThingsBoard Cloud.

    The original plain CoAP receive issue is resolved. This is only an update for the CoAPS/DTLS follow-up discussed above.

    Current setup:
    - nRF54L15 DK
    - NCS 3.4.0
    - Zephyr secure sockets over OpenThread/NAT64
    - Endpoint: coap.eu.thingsboard.cloud:5684
    - TLS_PEER_VERIFY_NONE for this test
    - Authentication intended via the ThingsBoard access token in the CoAP URI, not via client certificate

    The ClientHello now contains:
    - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
    - supported_groups: x25519, secp256r1, secp384r1
    - signature_algorithms: ecdsa_secp384r1_sha384, rsa_pkcs1_sha384, ecdsa_secp256r1_sha256, rsa_pkcs1_sha256
    - SNI: coap.eu.thingsboard.cloud

    With this ClientHello, ThingsBoard no longer responds with handshake_failure. The server accepts the ClientHello and sends:
    - ServerHello
    - Certificate
    - ServerKeyExchange
    - CertificateRequest
    - ServerHelloDone

    After ServerHelloDone, the Zephyr client does not send the next handshake flight. There is no ClientCertificate, ClientKeyExchange, ChangeCipherSpec or Finished from the device.

    The application log shows:

    net_sock_tls: TLS handshake error: -0x8d
    zsock_connect() fails with errno 113

    I attached the clean PCAP for this reproduction.

    As an additional comparison, I also tested the same Zephyr secure socket / CoAP client against my own local CoAPS test server. That server also uses mbedTLS and the handshake succeeds there. The client can complete the DTLS handshake and send CoAP data successfully. However, that local handshake is much simpler than the ThingsBoard Cloud handshake: it uses a shorter certificate setup and does not involve the same complex server certificate chain / CertificateRequest flight.

    So from my current understanding, Zephyr secure sockets with DTLS and CoAP basically work in this setup. The remaining issue seems specific to processing the ThingsBoard server flight, possibly around the CertificateRequest handling or the larger / fragmented certificate flight.

    OpenSSL succeeds against the same ThingsBoard endpoint with equivalent cipher/group/signature settings.

    Do you have an idea what TLS handshake error -0x8d means in this Zephyr secure socket path, or how I could get the original mbedTLS error code behind it?

    Regards,
    Markus

    coaps-zephyr-socket-thingsboard-clean-repro.pcap

Reply
  • Hi Achim,

    thanks again for your input. I now have a cleaner capture for the Zephyr secure socket / DTLS case against ThingsBoard Cloud.

    The original plain CoAP receive issue is resolved. This is only an update for the CoAPS/DTLS follow-up discussed above.

    Current setup:
    - nRF54L15 DK
    - NCS 3.4.0
    - Zephyr secure sockets over OpenThread/NAT64
    - Endpoint: coap.eu.thingsboard.cloud:5684
    - TLS_PEER_VERIFY_NONE for this test
    - Authentication intended via the ThingsBoard access token in the CoAP URI, not via client certificate

    The ClientHello now contains:
    - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
    - supported_groups: x25519, secp256r1, secp384r1
    - signature_algorithms: ecdsa_secp384r1_sha384, rsa_pkcs1_sha384, ecdsa_secp256r1_sha256, rsa_pkcs1_sha256
    - SNI: coap.eu.thingsboard.cloud

    With this ClientHello, ThingsBoard no longer responds with handshake_failure. The server accepts the ClientHello and sends:
    - ServerHello
    - Certificate
    - ServerKeyExchange
    - CertificateRequest
    - ServerHelloDone

    After ServerHelloDone, the Zephyr client does not send the next handshake flight. There is no ClientCertificate, ClientKeyExchange, ChangeCipherSpec or Finished from the device.

    The application log shows:

    net_sock_tls: TLS handshake error: -0x8d
    zsock_connect() fails with errno 113

    I attached the clean PCAP for this reproduction.

    As an additional comparison, I also tested the same Zephyr secure socket / CoAP client against my own local CoAPS test server. That server also uses mbedTLS and the handshake succeeds there. The client can complete the DTLS handshake and send CoAP data successfully. However, that local handshake is much simpler than the ThingsBoard Cloud handshake: it uses a shorter certificate setup and does not involve the same complex server certificate chain / CertificateRequest flight.

    So from my current understanding, Zephyr secure sockets with DTLS and CoAP basically work in this setup. The remaining issue seems specific to processing the ThingsBoard server flight, possibly around the CertificateRequest handling or the larger / fragmented certificate flight.

    OpenSSL succeeds against the same ThingsBoard endpoint with equivalent cipher/group/signature settings.

    Do you have an idea what TLS handshake error -0x8d means in this Zephyr secure socket path, or how I could get the original mbedTLS error code behind it?

    Regards,
    Markus

    coaps-zephyr-socket-thingsboard-clean-repro.pcap

Children
No Data
Related