This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Potential memory leak when publishing to MQTT with multiple threads.

Hello,

I have created an application that collects data from sensors and GPS in two separate threads and publishes via MQTT. The two thread priorities are set to 10 and 11. After a few minutes of running and properly sending data to AWS, the application freezes inside a mqtt_publish call, and seemingly deadlocks the system. By debugging I can see that execution is stuck in mem_alloc.constprop.5(). I have seen this same issue at the following link from a month ago with no resolution: 

https://devzone.nordicsemi.com/f/nordic-q-a/62134/possible-memory-leak-when-using-mqtt-in-discontinuous-fashion

I am wondering if there is a solution from Nordic regarding this, or if there is a known reentrancy issue with the MQTT stack. I don't believe this is a reentrancy issue or race condition on my end, but have attached my relevant code in case. Two threads call this function periodically:

s32_t s32_app_cloud_dataPublish(u8_t *data, size_t len) 
{
  struct mqtt_publish_param param;

  param.message.topic.qos = MQTT_QOS_0_AT_MOST_ONCE;
  param.message.topic.topic.utf8 = CONFIG_MQTT_PUB_TOPIC;
  param.message.topic.topic.size = strlen(CONFIG_MQTT_PUB_TOPIC);
  param.message.payload.data = data;
  param.message.payload.len = len;
  param.message_id = sys_rand32_get(); 
  param.dup_flag = 0;
  param.retain_flag = 0;

  LOG_DBG("Publishing");
  LOG_DBG("to topic: %s length: %u bytes", CONFIG_MQTT_PUB_TOPIC, len);

  return mqtt_publish(&mqtt_s_client, &param);
}

mqtt_publish() is the NCS API, and so it then locks the mutex on the client.  Below is LTE link monitor output, showing the last debug message printed before the terminal becomes unresponsive:

2020-07-27T15:26:36.438Z DEBUG modem << [00:02:09.344,604] [0m<dbg> nrf9160_gps.gps_thread: Waiting for time window to operate[0m
2020-07-27T15:26:36.501Z DEBUG modem << [00:02:09.344,635] [0m<dbg> app_gps.vd_app_gps_handler: GPS_EVT_OPERATION_BLOCKED[0m
2020-07-27T15:26:36.506Z DEBUG modem << [00:02:09.976,715] [0m<dbg> net_mqtt.mqtt_input: (0x20020f90): state:0x00000006[0m
2020-07-27T15:26:36.508Z DEBUG modem << [00:02:09.976,745] [0m<dbg> net_mqtt_dec.unpack_uint8: (0x20020f90): >> cur:0x20022917, end:0x20022919[0m
2020-07-27T15:26:36.509Z DEBUG modem << [00:02:09.976,745] [0m<dbg> net_mqtt_dec.unpack_uint8: (0x20020f90): << val:40[0m
2020-07-27T15:26:36.511Z DEBUG modem << [00:02:09.976,745] [0m<dbg> net_mqtt_dec.packet_length_decode: (0x20020f90): length:0x00000002[0m
2020-07-27T15:26:36.512Z DEBUG modem << [00:02:09.976,806] [0m<dbg> net_mqtt_rx.mqtt_handle_packet: (0x20020f90): [CID 0x20021050]: Received MQTT_PKT_TYPE_PUBACK![0m
2020-07-27T15:26:36.513Z DEBUG modem << [00:02:09.976,806] [0m<dbg> net_mqtt_dec.unpack_uint16: (0x20020f90): >> cur:0x20022919, end:0x2002291b[0m
2020-07-27T15:26:36.587Z DEBUG modem << [00:02:09.976,837] [0m<dbg> net_mqtt_dec.unpack_uint16: (0x20020f90): << val:8d8e[0m
2020-07-27T15:26:36.591Z DEBUG modem << [00:02:09.976,837] [0m<dbg> app_cloud.vd_app_cloud_evtHandler: [vd_app_cloud_evtHandler:586] PUBACK packet id: 36238[0m
2020-07-27T15:26:36.594Z DEBUG modem << [00:02:10.090,270] [0m<dbg> net_mqtt.mqtt_input: (0x20020f90): state:0x00000006[0m
2020-07-27T15:26:36.596Z DEBUG modem << [1;31m--- 2 messages dropped ---
2020-07-27T15:26:36.597Z DEBUG modem << [0m[00:02:10.090,332] [0m<dbg> net_mqtt_dec.packet_length_decode: (0x20020f90): length:0x00000002[0m
2020-07-27T15:26:36.662Z DEBUG modem << [00:02:10.090,362] [0m<dbg> net_mqtt_rx.mqtt_handle_packet: (0x20020f90): [CID 0x20021050]: Received MQTT_PKT_TYPE_PUBACK![0m
2020-07-27T15:26:36.727Z DEBUG modem << [00:02:10.090,393] [0m<dbg> net_mqtt_dec.unpack_uint16: (0x20020f90): >> cur:0x20022919, end:0x2002291b[0m
2020-07-27T15:26:36.729Z DEBUG modem << [00:02:10.090,393] [0m<dbg> net_mqtt_dec.unpack_uint16: (0x20020f90): << val:20be[0m
2020-07-27T15:26:36.731Z DEBUG modem << [00:02:10.090,393] [0m<dbg> app_cloud.vd_app_cloud_evtHandler: [vd_app_cloud_evtHandler:586] PUBACK packet id: 8382[0m
2020-07-27T15:26:36.733Z DEBUG modem << [00:02:10.112,487] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: Publishing[0m
2020-07-27T15:26:36.736Z DEBUG modem << [00:02:10.112,487] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: to topic: my/publish/topic length: 174 bytes[0m
2020-07-27T15:26:36.737Z DEBUG modem << [00:02:10.112,518] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: >> Topic size 0x00000010, Data size 0x000000ae[0m
2020-07-27T15:26:36.739Z DEBUG modem << [00:02:10.112,548] [0m<dbg> net_mqtt_enc.pack_utf8_str: (0x200210e4): >> str_size:00000012 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:36.804Z DEBUG modem << [00:02:10.112,548] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0010 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:36.806Z DEBUG modem << [00:02:10.112,579] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0e66 cur:0x20022a5a, end:0x20022b6f[0m
2020-07-27T15:26:36.809Z DEBUG modem << [00:02:10.112,579] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): << msg type:0x32 length:0x000000c2[0m
2020-07-27T15:26:36.813Z DEBUG modem << [00:02:10.112,579] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x00000000, end:0x00000000[0m
2020-07-27T15:26:36.815Z DEBUG modem << [00:02:10.112,609] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): Fixed header length = 03[0m
2020-07-27T15:26:36.816Z DEBUG modem << [00:02:10.112,609] [0m<dbg> net_mqtt_enc.pack_uint8: (0x200210e4): >> val:32 cur:0x20022a45, end:0x20022b6f[0m
2020-07-27T15:26:36.817Z DEBUG modem << [00:02:10.112,609] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x20022a46, end:0x20022b6f[0m
2020-07-27T15:26:36.883Z DEBUG modem << [00:02:10.112,609] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport writing message.[0m
2020-07-27T15:26:36.885Z DEBUG modem << [00:02:10.114,318] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport write complete.[0m
2020-07-27T15:26:36.887Z DEBUG modem << [00:02:10.114,349] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: << result 0x00000000[0m
2020-07-27T15:26:36.891Z DEBUG modem << [00:02:10.114,379] [0m<dbg> app_sensors.vd_app_sensors_thread: Accel sent to cloud[0m
2020-07-27T15:26:41.575Z DEBUG modem << [00:02:15.114,532] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: Publishing[0m
2020-07-27T15:26:41.637Z DEBUG modem << [00:02:15.114,562] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: to topic: my/publish/topic length: 174 bytes[0m
2020-07-27T15:26:41.639Z DEBUG modem << [00:02:15.114,562] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: >> Topic size 0x00000010, Data size 0x000000ae[0m
2020-07-27T15:26:41.641Z DEBUG modem << [00:02:15.114,593] [0m<dbg> net_mqtt_enc.pack_utf8_str: (0x200210e4): >> str_size:00000012 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:41.642Z DEBUG modem << [00:02:15.114,624] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0010 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:41.644Z DEBUG modem << [00:02:15.114,624] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:8ea9 cur:0x20022a5a, end:0x20022b6f[0m
2020-07-27T15:26:41.711Z DEBUG modem << [00:02:15.114,624] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): << msg type:0x32 length:0x000000c2[0m
2020-07-27T15:26:41.775Z DEBUG modem << [00:02:15.114,654] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x00000000, end:0x00000000[0m
2020-07-27T15:26:41.777Z DEBUG modem << [00:02:15.114,654] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): Fixed header length = 03[0m
2020-07-27T15:26:41.779Z DEBUG modem << [00:02:15.114,654] [0m<dbg> net_mqtt_enc.pack_uint8: (0x200210e4): >> val:32 cur:0x20022a45, end:0x20022b6f[0m
2020-07-27T15:26:41.782Z DEBUG modem << [00:02:15.114,685] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x20022a46, end:0x20022b6f[0m
2020-07-27T15:26:41.783Z DEBUG modem << [00:02:15.114,685] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport writing message.[0m
2020-07-27T15:26:41.784Z DEBUG modem << [00:02:15.116,729] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport write complete.[0m
2020-07-27T15:26:41.786Z DEBUG modem << [00:02:15.116,760] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: << result 0x00000000[0m
2020-07-27T15:26:41.852Z DEBUG modem << [00:02:15.116,760] [0m<dbg> app_sensors.vd_app_sensors_thread: Accel sent to cloud[0m
2020-07-27T15:26:46.577Z DEBUG modem << [00:02:20.116,943] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: Publishing[0m
2020-07-27T15:26:46.641Z DEBUG modem << [00:02:20.116,973] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: to topic: my/publish/topic length: 174 bytes[0m
2020-07-27T15:26:46.643Z DEBUG modem << [00:02:20.116,973] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: >> Topic size 0x00000010, Data size 0x000000ae[0m
2020-07-27T15:26:46.645Z DEBUG modem << [00:02:20.117,004] [0m<dbg> net_mqtt_enc.pack_utf8_str: (0x200210e4): >> str_size:00000012 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:46.646Z DEBUG modem << [00:02:20.117,034] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0010 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:46.647Z DEBUG modem << [00:02:20.117,034] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0ef8 cur:0x20022a5a, end:0x20022b6f[0m
2020-07-27T15:26:46.716Z DEBUG modem << [00:02:20.117,034] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): << msg type:0x32 length:0x000000c2[0m
2020-07-27T15:26:46.783Z DEBUG modem << [00:02:20.117,065] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x00000000, end:0x00000000[0m
2020-07-27T15:26:46.785Z DEBUG modem << [00:02:20.117,065] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): Fixed header length = 03[0m
2020-07-27T15:26:46.787Z DEBUG modem << [00:02:20.117,065] [0m<dbg> net_mqtt_enc.pack_uint8: (0x200210e4): >> val:32 cur:0x20022a45, end:0x20022b6f[0m
2020-07-27T15:26:46.789Z DEBUG modem << [00:02:20.117,095] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x20022a46, end:0x20022b6f[0m
2020-07-27T15:26:46.791Z DEBUG modem << [00:02:20.117,095] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport writing message.[0m
2020-07-27T15:26:46.792Z DEBUG modem << [00:02:20.118,804] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport write complete.[0m
2020-07-27T15:26:46.793Z DEBUG modem << [00:02:20.118,804] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: << result 0x00000000[0m
2020-07-27T15:26:46.859Z DEBUG modem << [00:02:20.118,835] [0m<dbg> app_sensors.vd_app_sensors_thread: Accel sent to cloud[0m
2020-07-27T15:26:47.423Z DEBUG modem << +CSCON: 0
2020-07-27T15:26:48.588Z DEBUG modem << [00:02:21.126,312] [0m<dbg> nrf9160_gps.gps_thread: GPS has time window to operate[0m
2020-07-27T15:26:48.655Z DEBUG modem << [00:02:21.126,342] [0m<dbg> app_gps.vd_app_gps_handler: GPS_EVT_OPERATION_UNBLOCKED[0m
2020-07-27T15:26:48.657Z DEBUG modem << [00:02:21.126,342] [0m<dbg> nrf9160_gps.print_satellite_stats: Tracking: 0 Using: 0 Unhealthy: 0[0m
2020-07-27T15:26:48.659Z DEBUG modem << [00:02:21.126,373] [0m<dbg> nrf9160_gps.print_satellite_stats: Seconds since last fix 13[0m
2020-07-27T15:26:48.661Z DEBUG modem << [00:02:22.162,109] [0m<dbg> nrf9160_gps.gps_thread: PVT: Position fix[0m
2020-07-27T15:26:48.662Z DEBUG modem << [00:02:22.162,567] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: Publishing[0m
2020-07-27T15:26:48.664Z DEBUG modem << [00:02:22.162,567] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: to topic: my/publish/topic length: 208 bytes[0m
2020-07-27T15:26:48.665Z DEBUG modem << [00:02:22.162,567] [0m<dbg> net_mqtt.mqtt_publish: (0x20020e30): [CID 0x20021050]:[State 0x06]: >> Topic size 0x00000010, Data size 0x000000d0[0m
2020-07-27T15:26:48.733Z DEBUG modem << [00:02:22.162,597] [0m<dbg> net_mqtt_enc.pack_utf8_str: (0x20020e30): >> str_size:00000012 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:48.798Z DEBUG modem << [00:02:22.162,628] [0m<dbg> net_mqtt_enc.pack_uint16: (0x20020e30): >> val:0010 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:48.800Z DEBUG modem << [00:02:22.162,628] [0m<dbg> net_mqtt_enc.pack_uint16: (0x20020e30): >> val:14ce cur:0x20022a5a, end:0x20022b6f[0m
2020-07-27T15:26:48.802Z DEBUG modem << [00:02:22.162,628] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x20020e30): << msg type:0x32 length:0x000000e4[0m
2020-07-27T15:26:48.805Z DEBUG modem << [00:02:22.162,658] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x20020e30): >> length:0x000000e4 cur:0x00000000, end:0x00000000[0m
2020-07-27T15:26:48.807Z DEBUG modem << [00:02:22.162,658] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x20020e30): Fixed header length = 03[0m
2020-07-27T15:26:48.808Z DEBUG modem << [00:02:22.162,689] [0m<dbg> net_mqtt_enc.pack_uint8: (0x20020e30): >> val:32 cur:0x20022a45, end:0x20022b6f[0m
2020-07-27T15:26:48.873Z DEBUG modem << [00:02:22.162,689] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x20020e30): >> length:0x000000e4 cur:0x20022a46, end:0x20022b6f[0m
2020-07-27T15:26:48.876Z DEBUG modem << [00:02:22.162,689] [0m<dbg> net_mqtt.client_write_msg: (0x20020e30): [0x20021050]: Transport writing message.[0m
2020-07-27T15:26:56.888Z DEBUG modem >> AT
2020-07-27T15:26:57.888Z ERROR Error: 'AT

Callstack when the bug occurs is below:

I have tried increasing the system heap to no avail. 

Parents Reply
  • Hi Hakon,

    I have worked around the issue by changing the threads in my application. Instead of multiple threads calling mqtt_publish(), the messages are buffered in the application and then a single thread sends out all buffered messages. No other functionality was changed, so this adds evidence to there being a reentrancy issue in the MQTT stack.

    I will still capture a modem trace when I can with the issue if that will help root-cause it. 

Children
Related