MQTT Connect fails with error -116 when using TLS

I'm building the `nrf/samples/net/mqtt` sample for the nrf9160dk, specifying a custom certificate inside `src/modules/transport/credentials` as directed in the Kconfig file.

The CA and client certificate / key are specified in the "correct" PEM format, formatted as a C string with newline characters. Sending `at AT%CMNG=2,24,0` correctly reads back the CA certificate.

I'm building with the following flags:

west build -t flash -b nrf9160dk_nrf9160_ns nrf/samples/net/mqtt -DCONFIG_MQTT_HELPER_PORT=8883 -DCONFIG_MQTT_SAMPLE_TRANSPORT_BROKER_HOSTNAME=\"mqtt.preview.playerdata.co.uk\" -DCONFIG_SHELL_BACKEND_RTT=y -DCONFIG_USE_SEGGER_RTT=y -DCONFIG_SHELL=y -DCONFIG_MQTT_HELPER_LOG_LEVEL_DBG=y -DCONFIG_MQTT_LOG_LEVEL_DBG=y -DCONFIG_NRF_MODEM_LIB_LOG_LEVEL_DBG=y -DCONFIG_NRF_MODEM_LOG=y -DCONFIG_MODEM_KEY_MGMT=y -DCONFIG_MQTT_HELPER_SEC_TAG=24 -DCONFIG_MQTT_LIB_TLS=y -DCONFIG_MQTT_LIB=y -DCONFIG_NRF_MODEM_LIB_TRACE=y -DCONFIG_MODEM_KEY_MGMT_LOG_LEVEL_DBG=y -DCONFIG_AT_SHELL=y

The result is that `mqtt_connect` fails with error code -116:

rtt:~$

*** Booting Zephyr OS build v3.3.99-ncs1-1 ***
[00:00:00.327,972] <inf> network: Bringing network interface up
[00:00:00.328,002] <inf> nrf_modem: Initializing libmodem 2.4.1-lte-7821d3f67ae9
[00:00:00.328,430] <dbg> nrf_modem: Control region at 0x20008000 (0x4e8 bytes)
[00:00:00.328,460] <dbg> nrf_modem: TX region at 0x200084e8 (0x2000 bytes)
[00:00:00.328,491] <dbg> nrf_modem: RX region at 0x2000a568 (0x2000 bytes)
[00:00:00.328,521] <dbg> nrf_modem: Trace region at 0x2000c568 (0x4000 bytes)
[00:00:00.329,132] <dbg> mqtt_helper: mqtt_state_set: State transition: MQTT_STATE_UNINIT --> MQTT_STATE_DISCONNECTED
[00:00:00.330,352] <dbg> mqtt_helper: mqtt_helper_poll_loop: Waiting for connection_poll_sem
--- 14 messages dropped ---
[00:00:00.565,216] <dbg> nrf_modem: AT+CEDRXS=3
[00:00:00.565,551] <dbg> nrf_modem: _nrf_modem_lib_init: Modem init callback: 0x30eb7
[00:00:00.565,643] <dbg> nrf_modem: AT%XEPCO=1
[00:00:00.566,070] <dbg> nrf_modem: _nrf_modem_lib_init: Modem init callback: 0xa7cd
[00:00:00.566,131] <dbg> nrf_modem: AT+CMEE?
[00:00:00.566,589] <dbg> nrf_modem: AT+CMEE=1
[00:00:00.820,465] <dbg> nrf_modem: AT+CMEE=0
[00:00:00.820,861] <dbg> nrf_modem: AT+CMEE?
[00:00:00.821,380] <dbg> nrf_modem: AT+CMEE=1
[00:00:02.239,044] <dbg> nrf_modem: AT+CMEE=0
[00:00:02.239,471] <dbg> nrf_modem: AT+CMEE?
[00:00:02.239,990] <dbg> nrf_modem: AT+CMEE=1
[00:00:02.297,821] <dbg> nrf_modem: AT+CMEE=0
[00:00:02.298,156] <dbg> nrf_modem: _nrf_modem_lib_init: Modem init callback: 0x30b61
[00:00:02.298,217] <inf> nrf_modem_lib_trace: Trace thread ready
[00:00:02.298,339] <dbg> nrf_modem: AT%XMODEMTRACE=1,2
[00:00:02.299,713] <inf> nrf_modem_lib_trace: Trace level override: 2
[00:00:02.299,835] <dbg> nrf_modem: AT%XSYSTEMMODE?
[00:00:02.300,384] <dbg> nrf_modem: AT+CEREG=5
[00:00:02.306,823] <dbg> nrf_modem: AT+CSCON=1
[00:00:02.307,250] <inf> network: Connecting...
[00:00:02.307,373] <dbg> nrf_modem: AT+CEREG=5
[00:00:02.307,769] <dbg> nrf_modem: AT+CSCON=1
[00:00:02.308,197] <dbg> nrf_modem: AT+CFUN=21
[00:00:02.339,050] <dbg> nrf_modem: AT+CNEC=16
[00:00:02.339,538] <dbg> nrf_modem: AT+CGEREP=1
[00:00:04.756,774] <dbg> nrf_modem: AT+CGPADDR=0
[00:00:04.858,642] <inf> network: IP Up
[00:00:09.858,978] <dbg> nrf_modem: AT+CGSN
[00:00:09.859,649] <dbg> mqtt_helper: broker_init: Resolving IP address for mqtt.preview.playerdata.co.uk
[00:00:09.859,741] <inf> nrf_modem: getaddrinfo() fam: 0x0, type 0x1, proto 0x6
[00:00:10.010,253] <inf> nrf_modem: RPC_IP_GETADDRINFO_RES, result RPC_IP_ERR_OK
[00:00:10.010,345] <dbg> nrf_modem: ai_length 0x39, ai_family 0x1, ai_addrlen 0x4, ai_namelen 0x1d
[00:00:10.010,437] <inf> nrf_modem: Resolved 34.105.196.147
[00:00:10.010,681] <dbg> mqtt_helper: broker_init: IPv4 Address found 34.105.196.147 (AF_INET)
[00:00:10.010,742] <dbg> mqtt_helper: mqtt_state_set: State transition: MQTT_STATE_DISCONNECTED --> MQTT_STATE_TRANSPORT_CONNECTING
[00:00:10.010,803] <inf> nrf_modem: socket() fam 0x1, type 0x1, proto 0x102
[00:00:10.011,108] <inf> nrf_modem: RPC_IP_OPEN_RES, result RPC_IP_ERR_OK
[00:00:10.011,230] <dbg> nrf_modem: TLS peer verification 2, fd 0x0
[00:00:10.011,291] <dbg> nrf_modem: TLS tags: 1, fd 0x0
[00:00:10.011,413] <dbg> nrf_modem: TLS hostname: mqtt.preview.playerdata.co.uk, fd 0x0
[00:00:10.011,444] <inf> nrf_modem: connect() fd 0x0
[00:00:10.011,505] <inf> nrf_modem: sa_family 0x1, destaddr_len 0x4, destport 8883
[00:00:10.071,563] <inf> nrf_modem: RPC_IP_CONNECT_RES fd 0x0, result RPC_IP_ERR_OK
[00:00:10.071,655] <dbg> nrf_modem: Waiting for handshake semaphore
[00:00:10.071,746] <inf> nrf_modem: Attaching sock fd 0x0
[00:00:10.071,777] <dbg> nrf_modem: Hostname: mqtt.preview.playerdata.co.uk
[00:00:10.071,838] <dbg> nrf_modem: role 0x2, verify 0x2, cache 0x0, tags count 1
[00:00:10.071,868] <dbg> nrf_modem: tag[0]: 24
[00:00:10.115,631] <inf> nrf_modem: RPC_IP_TLS_ATTACH_RES fd 0x0, result RPC_IP_ERR_OK
[00:00:27.930,206] <inf> nrf_modem: RPC_IP_CLOSED_NTF fd 0x0
[00:00:27.930,236] <inf> nrf_modem: Close reason RPC_IP_ERR_CONN_RESET
[00:00:27.931,030] <inf> nrf_modem: RPC_IP_TLS_HANDSHAKE_COMPLETE_NTF fd 0x0, result RPC_IP_ERR_TIMEOUT
[00:00:27.931,121] <inf> nrf_modem: close() fd 0x0
[00:00:27.932,403] <inf> nrf_modem: RPC_IP_CLOSE_RES fd 0x0, result RPC_IP_ERR_OK
[00:00:27.932,525] <err> mqtt_helper: mqtt_connect, error: -116
[00:00:27.932,556] <dbg> mqtt_helper: mqtt_state_set: State transition: MQTT_STATE_TRANSPORT_CONNECTING --> MQTT_STATE_DISCONNECTED
[00:00:27.932,586] <err> transport: Failed connecting to MQTT, error code: -116

Checking the server logs, it doesn't appear that the board has attempted to make a connection with the MQTT server.

Parents Reply Children
  • Hi,

    Looking at your trace file, there might be something related to the modem that we would potentially need to investigate. I have transferred your trace file to the team which could provide further insight into the issue.

    Meanwhile, it might be useful if I could get some additional information.

    Could you provide more information about your application? In which way did you change nrf/samples/net/mqtt sample? Does unmodified mqtt sample work as expected on the same nrf9160-dk board?

    Do you have another nrf9160-dk board to test your application on?

    Did this issue appear suddenly?

    Best regards,
    Dejan

  • Hi Dejan,

    Thank you for passing it on.

    The build parameters used for the mqtt sample are in the original question. The only additional change was to set our own certificates in https://github.com/nrfconnect/sdk-nrf/tree/main/samples/net/mqtt/src/modules/transport/credentials.

    We've actually had a bit of a breakthrough and managed to get the sample working.

    I'd originally used a 4096 bit RSA key, which seemed to trigger the behaviour. I did try a 2048 bit key - this didn't resolve the issue on the original board, even after sending the factory reset command.

    I've now tried on a different 9160-dk board with a 2048 bit RSA key, and everything is working properly.

    My theory is that using a 4096 bit key has caused an overflow somewhere that the modem hasn't been able to recover from?

    Regards,

    Hayden

  • Hi,

    Thank you for the update.

    Could you please provide full Wireshark trace from the server side?

    Could you provide serial/version numbers of both working and failing nrf9160-dk board?

    Best regards,
    Dejan

  • Hi Dejan,

    The failing board is PCA10090 v1.1.0 (2023.11), SN: 960069516.

    The working board is PCA10090 v1.1.0 (2023.11), SN: 960019782.

    Please find below a capture from the other week while debugging. This was done through a tunnelled connection (hence the loopback addresses) as I don't have the ability to capture packets on the live server.

    The full setup was:

    * Laptop running openssl s_server, configured with a CAfile

    * ngrok

    * The MQTT sample above, with the host and port set to the address and port specified by ngrok

    * Wireshark running against the loopback interface

    You can see in the trace traffic coming from the board on port 50645 going to 8883, and responding to the board in the opposite direction. 

    For awareness, I now don't have easy access to the failing board - it is now with a colleague.

    9160-DK.pcapng

  • Hi,

    Thank you for providing additional information and pcap file.

    While working with a failing board, did you possibly list and then clean all PSKs/certificates from the board using AT%CMNG?

    Best regards,
    Dejan

Related