Azure IoT Hub (MQTT) disconnect when using UART in

Hi,

I'm running into a strange issue with an application based on the azure_iot_hub sample using ncs v2.5.0 running on the nRF9160DK. Everything works fine when I run the code relevant for connecting to the Azure IoT Hub and sending messages in isolation. However when I include all the additional code of my application, the device connects just fine, it can send the first message correctly, but then I get a LOG_WRN indicating that Azure IoT hub is disconnected (result = -128). When debugging, I see that that this event happens during mqtt_input in the mqtt_helper_poll_loop function on line 650.

I have been able to isolate the occurrence of this happening to when I use a module of my code that uses UART2 (on the nRF9160) with Async API to communicate with the nRF52840 and receives responses. I'm not sure if it is relevant, but the processing of receiving messages is done in a dedicated work_queue with a priority set to -1. If I don't send anything over UART (and hence don't receive anything), it works as expected. After sending something over the UART (and receiving a response), the next message (the sample is set up to send the same message every 20 seconds) is sent successfully (Event was successfully sent) but then I get the DISCONNECTED message shortly after. I was wondering if there is some known 'interference' between using a UART (Async API) and MQTT or if this could be related to the work_queue starving the MQTT thread? I tried running the code with not running uart_rx_enable to prevent using the work_queue, but this doesn't seem to resolve it, so it seems to be mostly related to the UART functionality. 

I browsed for similar issues on here and most of it seems to be related to setting up certificates, but all that is fine since it works fine as long as I don't send something over UART.

Best,

Wout

  • Hi Wout,

    Thanks for checking with us about this issue.

    It is hard to see the cause just from your description. You may try to run your uart part of application form a seperate thread, this will help to isolate it with MQTT module better. 

    Best regards,

    Charlie

  • Hi Charlie, 

    Thank you getting back to me. I appreciate that it's not easy to support finding the cause of this without having the full context. I'll see if I can isolate the uart part more or find a more minimal example that I can share to have a go at yourself.

    I thought that setting up a dedicated work_queue (which I did) already uses a separate thread and that the work_queue object just helps to abstract the workings of the thread a bit. Am I wrong in thinking that? Would setting up an actual separate thread with it's own main function behave differently than having a dedicated work_queue?

    Best,

    Wout

  • Do you run something time comsumming/blocking in work hanlder? This may cause other time sensitive threads like MQTT commincation stack to fail.

    https://docs.nordicsemi.com/bundle/ncs-2.7.0/page/zephyr/kernel/services/threads/workqueue.html#workqueue_best_practices many give you some inspiration.

    Best regards,

    Charlie

  • Hi Charlie,

    Sorry for the delay, I bunkered in to drill down on the issue and try to find a root cause. I believe I'm pretty close, but it's still not making much sense to me.

    It turns out the issue above is not related to UART, but the disconnect can be triggered by anything coming after connecting to the Azure server. It just happened to be that the uart calls came after connecting to the server. After spending hours comparing with the azure_iot_hub example and trying out different things, these are my current findings.

    In my own application, I created a separate file to handle all the networking (called network.c). I had moved all the functionality from the azure_iot_hub example to that file and called a network_init() function to set up the connection and another network_connect_to_server() function to connect to the Azure server.

    It turns out that the culprit is moving the azure_iot_hub_connect(&cfg) (and initializing the cfg struct) outside of the main function, even though I called network_connect_to_server() (which contains this functionality) from the main function in exactly the same way.

    So to put it in code, this doesn't work

    /** main.c **/
    
    int main(void)
    {
        /* ...*/
        network_connect_to_server();
        
        /* Doing something else after this and sending periodic messages results in MQTT error and Azure disconnect */
    }
    
    /** network.c **/
    int network_connect_to_server()
    {
        int err;
        char hostname[128] = CONFIG_AZURE_IOT_HUB_HOSTNAME;
        char device_id[128] = CONFIG_AZURE_IOT_HUB_DEVICE_ID;
        struct azure_iot_hub_config cfg = {
            .device_id = {
                .ptr = device_id,
                .size = strlen(device_id),
            },
            .hostname = {
                .ptr = hostname,
                .size = strlen(hostname),
            },
            // .use_dps = true,
        };
        LOG_INF("Device ID: %s", device_id);
        LOG_INF("Host name: %s", hostname);
    
        err = azure_iot_hub_init(azure_event_handler);
        if (err)
        {
            LOG_ERR("Azure IoT Hub could not be initialized, error: %d", err);
            return err;
        }
    
        LOG_INF("Azure IoT Hub library initialized");
    
        err = azure_iot_hub_connect(&cfg);
        if (err < 0)
        {
            LOG_ERR("azure_iot_hub_connect failed: %d", err);
            return err;
        }
    
        LOG_INF("Connection request sent to IoT Hub");
        return err;
    }

    But this does

    /** main.c **/
    
    int main(void)
    {
        /* ...*/
        network_init_iot_hub();
        
        char hostname[128] = CONFIG_AZURE_IOT_HUB_HOSTNAME;
        char device_id[128] = CONFIG_AZURE_IOT_HUB_DEVICE_ID;
        struct azure_iot_hub_config cfg = {
            .device_id = {
                .ptr = device_id,
                .size = strlen(device_id),
            },
            .hostname = {
                .ptr = hostname,
                .size = strlen(hostname),
            },
            // .use_dps = true,
        };
        LOG_INF("Device ID: %s", device_id);
        LOG_INF("Host name: %s", hostname);
        
        err = azure_iot_hub_connect(&cfg);
        if (err < 0)
        {
            LOG_ERR("azure_iot_hub_connect failed: %d", err);
            return err;
        }
        
        LOG_INF("Connection request sent to IoT Hub");
        
        /* Doing something else after this and sending periodic messages works fine */
    }
    
    /** network.c **/
    int network_init_iot_hub()
    {
        int err;
    
        err = azure_iot_hub_init(azure_event_handler);
        if (err)
        {
            LOG_ERR("Azure IoT Hub could not be initialized, error: %d", err);
            return err;
        }
    
        LOG_INF("Azure IoT Hub library initialized");
        return err;
    }

    Do you have any idea why that could be?

  • Hi Wout,

    Sorry for the late reply.I did not see clear difference between the two codes.

    It seems to me that azure_iot_hub_init run after or before cfg generated makes the difference, but the offical azure iot hub does use the first way. 

    I suggest you put all the codes flat inside main and verfiy the order. The azure IoT hub could be a good referece for you to compare the difference.

    Let me know if you need more help or have any update. 

    Best regards,

    Charlie

     

Related