Trying to get Azure IoT Hub sample working with MQTT over WebSockets

Greetings!

Our device will use WiFi network to send data to the Azure's IoT Hub, where it will be further forwarded to our cloud. As the used network won't be controlled by us, but by a third party, we can not assume that the usual MQTT ports will be opened. So, thus we must resort to using the MQTT over the Websockets.


As a minimal working setup I am trying to get nrf/samples/net/azure_iot_hub sample running, but using the Websockets. The default setup which uses the MQTT works, but I have troubles getting the Websockets running. My suspicion is that something related to the MBedTLS or certificates fails and thus the connection is closed, however I lack additional knowledge to further pinpoint the problem.

I am using nRF7002DK, running NCS v.2.7.0.


I will first describe the steps I took in my debugging journey and then end with the current Mbedtls log output that I am trying to understand.


Support for MQTT over WebSockets

Although the IoT Hub supports MQTT over Websockets (see first note in the linked article) and Zephyr support the WebSockets, the mqtt_helper.c library in the NCS doesn't provide support for it. So to get the MQTT over WebSockets working I wrote a small patch for it. I took the inspiration from the Zephyr's mqtt_publisher example.



Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
diff --git a/subsys/net/lib/mqtt_helper/mqtt_helper.c b/subsys/net/lib/mqtt_helper/mqtt_helper.c
index e9755309b0..001119ddaf 100644
--- a/subsys/net/lib/mqtt_helper/mqtt_helper.c
+++ b/subsys/net/lib/mqtt_helper/mqtt_helper.c
@@ -36,6 +36,15 @@ BUILD_ASSERT((CONFIG_MQTT_HELPER_SEC_TAG != -1), "Security tag must be configure
#define MQTT_HELPER_STATIC static
#endif
+#if defined(CONFIG_MQTT_LIB_WEBSOCKET)
+/* Making RX buffer large enough that the full IPv6 packet can fit into it */
+#define MQTT_LIB_WEBSOCKET_RECV_BUF_LEN 1280
+
+/* Websocket needs temporary buffer to store partial packets */
+static uint8_t temp_ws_rx_buf[MQTT_LIB_WEBSOCKET_RECV_BUF_LEN];
+#endif
+
+
MQTT_HELPER_STATIC struct mqtt_client mqtt_client;
static struct sockaddr_storage broker;
static char rx_buffer[CONFIG_MQTT_HELPER_RX_TX_BUFFER_SIZE];
@@ -494,10 +503,28 @@ static int client_connect(struct mqtt_helper_conn_params *conn_params)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

To confirm that above patch was functioning as intended I used the nrf/samples/net/mqtt sample. I configured it connect to the Mosquitto's 8081 port (MQTT over WebSockets, encrypted, unauthenticated). As a different certificate for WebSockets was needed (compared to the provided one for the 8883 port - MQTT, encrypted, unauthenticated) I had to adjust some mbedtls-related KConfig options:

Fullscreen
1
2
CONFIG_MBEDTLS_MPI_MAX_SIZE=512
CONFIG_POSIX_MAX_FDS=32
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

With above setup I could successfully connect to the two public brokers: test.mosquitto.org and broker.emqx.io.


Debugging azure_iot_hub sample

As a starting point I had a working connection with the IoT hub, using the default setup. I am using the DPS to provision my device.

I then added the below KConfigs to my prj.conf to enable the Websockets. I accumulated the configs as I was debugging. Keep in mind that if you want to have functional mbedtls logs on NCS v2.7.0 you need to apply the patch that fixes them.

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
CONFIG_MQTT_LIB_WEBSOCKET=y
CONFIG_WEBSOCKET_CLIENT=y
CONFIG_MQTT_HELPER_PORT=443
# CONFIG_MQTT_HELPER_LOG_LEVEL_DBG=y
# CONFIG_MQTT_LOG_LEVEL_DBG=y
CONFIG_NET_LOG=y
CONFIG_NET_WEBSOCKET_LOG_LEVEL_DBG=y
CONFIG_NET_HTTP_LOG_LEVEL_DBG=y
# Enable Mbed TLS logs
CONFIG_MBEDTLS_DEBUG=y
CONFIG_MBEDTLS_DEBUG_C=y
CONFIG_MBEDTLS_DEBUG_LEVEL=3
CONFIG_NET_BUF_RX_COUNT=72
CONFIG_NET_BUF_TX_COUNT=72
CONFIG_MBEDTLS_SSL_MAX_CONTENT_LEN=4096
CONFIG_MBEDTLS_MPI_MAX_SIZE=512
CONFIG_POSIX_MAX_FDS=32
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The symbol that really made a difference was the CONFIG_MBEDTLS_SSL_RENEGOTIATION, as there was a clear warning when this wasn't enabled and then connection was closed by the server (as indicated by the the mbedtls_ssl_fetch_input() returned -29312 (-0x7280) line. Below is the captured log without the mentioned flag:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
*** Booting My Application v2.1.0-dev-daf2946a0f07 ***
*** Using nRF Connect SDK v2.7.0-5cb85570ca43 ***
*** Using Zephyr OS v3.6.99-100befc70c74 ***
I: Starting bootloader
I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Image index: 0, Swap type: none
I: Bootloader chainload address offset: 0x10000
I: Jumping to the first image slot
*** Booting nRF Connect SDK v2.7.0-5cb85570ca43 ***
*** Using Zephyr OS v3.6.99-100befc70c74 ***
[00:00:00.247,131] <inf> azure_iot_hub_sample: Azure IoT Hub sample started
[00:00:00.256,622] <inf> azure_iot_hub_sample: Bringing network interface up and connecting to the network
[00:00:01.832,275] <inf> wifi_mgmt_ext: Connection requested
[00:00:01.842,651] <inf> azure_iot_hub_sample: Device ID: nrf7002dk_010
[00:00:05.912,261] <inf> net_dhcpv4: Received: 192.168.76.247
[00:00:05.920,867] <inf> azure_iot_hub_sample: Network connectivity established and IP address assigned
[00:00:05.933,532] <inf> azure_iot_hub_sample: Connected to network
[00:00:05.942,321] <inf> azure_iot_hub_sample: Starting DPS
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

When renegotiation is enabled the traffic continues for much longer. The last thing that happens is that the connection times out on the net_http_client level:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
*** Booting nRF Connect SDK v2.7.0-5cb85570ca43 ***
*** Using Zephyr OS v3.6.99-100befc70c74 ***
[00:00:00.246,917] <inf> azure_iot_hub_sample: Azure IoT Hub sample started
[00:00:00.256,408] <inf> azure_iot_hub_sample: Bringing network interface up and connecting to the network
[00:00:01.832,244] <inf> wifi_mgmt_ext: Connection requested
[00:00:01.842,620] <inf> azure_iot_hub_sample: Device ID: nrf7002dk_010
[00:00:05.910,064] <inf> net_dhcpv4: Received: 192.168.76.247
[00:00:05.918,609] <inf> azure_iot_hub_sample: Network connectivity established and IP address assigned
[00:00:05.931,396] <inf> azure_iot_hub_sample: Connected to network
[00:00:05.940,216] <inf> azure_iot_hub_sample: Starting DPS
[00:00:05.950,195] <inf> azure_iot_hub_sample: DPS registration status: AZURE_IOT_HUB_DPS_REG_STATUS_NOT_STARTED
[00:00:05.962,982] <inf> azure_iot_hub_sample: Already assigned to an IoT hub, skipping DPS
[00:00:05.973,876] <inf> azure_iot_hub_sample: Device ID "nrf7002dk_010" assigned to IoT hub with hostname "bbc-grp-qst-sbx-iothub001.azure-devices.net"
[00:00:05.991,394] <inf> azure_fota: Current firmware version: 0.0.0-dev
[00:00:06.000,579] <inf> azure_iot_hub_sample: Azure IoT Hub library initialized
[00:00:06.010,528] <inf> azure_iot_hub_sample: AZURE_IOT_HUB_EVT_CONNECTING
[00:00:06.056,823] <err> net_dns_resolve: DNS recv error (-103)
[00:00:06.584,930] <err> net_dns_resolve: DNS recv error (-4)
[00:00:06.692,779] <wrn> mbedtls: ssl_tls.c:3914: => handshake
[00:00:06.701,416] <wrn> mbedtls: ssl_msg.c:2358: => flush output
[00:00:06.710,327] <wrn> mbedtls: ssl_msg.c:2367: <= flush output
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I am a bit out of possible next steps that I can do to debug this further. Azure documentation doesn't state that something different must be done to make MQTT over WebSockets work. They really only says: "MQTT over Websockets works on 443 port.

Since I confirmed that secure Websockets can work (when testing them with the mqtt example) I think that either there is something wrong with the device's certificates or that Azure has some undocumented behavior.

I know that none of this is directly Nordic's responsibility, however I would be happy if someone one the Nordics side with some knowledge of mbedtls and TLS protocol would look over the logs and check if there are some obvious error that I am currently missing.

Best,
Marko