This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Potential memory leak when publishing to MQTT with multiple threads.

Hello,

I have created an application that collects data from sensors and GPS in two separate threads and publishes via MQTT. The two thread priorities are set to 10 and 11. After a few minutes of running and properly sending data to AWS, the application freezes inside a mqtt_publish call, and seemingly deadlocks the system. By debugging I can see that execution is stuck in mem_alloc.constprop.5(). I have seen this same issue at the following link from a month ago with no resolution: 

https://devzone.nordicsemi.com/f/nordic-q-a/62134/possible-memory-leak-when-using-mqtt-in-discontinuous-fashion

I am wondering if there is a solution from Nordic regarding this, or if there is a known reentrancy issue with the MQTT stack. I don't believe this is a reentrancy issue or race condition on my end, but have attached my relevant code in case. Two threads call this function periodically:

s32_t s32_app_cloud_dataPublish(u8_t *data, size_t len) 
{
  struct mqtt_publish_param param;

  param.message.topic.qos = MQTT_QOS_0_AT_MOST_ONCE;
  param.message.topic.topic.utf8 = CONFIG_MQTT_PUB_TOPIC;
  param.message.topic.topic.size = strlen(CONFIG_MQTT_PUB_TOPIC);
  param.message.payload.data = data;
  param.message.payload.len = len;
  param.message_id = sys_rand32_get(); 
  param.dup_flag = 0;
  param.retain_flag = 0;

  LOG_DBG("Publishing");
  LOG_DBG("to topic: %s length: %u bytes", CONFIG_MQTT_PUB_TOPIC, len);

  return mqtt_publish(&mqtt_s_client, &param);
}

mqtt_publish() is the NCS API, and so it then locks the mutex on the client.  Below is LTE link monitor output, showing the last debug message printed before the terminal becomes unresponsive:

2020-07-27T15:26:36.438Z DEBUG modem << [00:02:09.344,604] [0m<dbg> nrf9160_gps.gps_thread: Waiting for time window to operate[0m
2020-07-27T15:26:36.501Z DEBUG modem << [00:02:09.344,635] [0m<dbg> app_gps.vd_app_gps_handler: GPS_EVT_OPERATION_BLOCKED[0m
2020-07-27T15:26:36.506Z DEBUG modem << [00:02:09.976,715] [0m<dbg> net_mqtt.mqtt_input: (0x20020f90): state:0x00000006[0m
2020-07-27T15:26:36.508Z DEBUG modem << [00:02:09.976,745] [0m<dbg> net_mqtt_dec.unpack_uint8: (0x20020f90): >> cur:0x20022917, end:0x20022919[0m
2020-07-27T15:26:36.509Z DEBUG modem << [00:02:09.976,745] [0m<dbg> net_mqtt_dec.unpack_uint8: (0x20020f90): << val:40[0m
2020-07-27T15:26:36.511Z DEBUG modem << [00:02:09.976,745] [0m<dbg> net_mqtt_dec.packet_length_decode: (0x20020f90): length:0x00000002[0m
2020-07-27T15:26:36.512Z DEBUG modem << [00:02:09.976,806] [0m<dbg> net_mqtt_rx.mqtt_handle_packet: (0x20020f90): [CID 0x20021050]: Received MQTT_PKT_TYPE_PUBACK![0m
2020-07-27T15:26:36.513Z DEBUG modem << [00:02:09.976,806] [0m<dbg> net_mqtt_dec.unpack_uint16: (0x20020f90): >> cur:0x20022919, end:0x2002291b[0m
2020-07-27T15:26:36.587Z DEBUG modem << [00:02:09.976,837] [0m<dbg> net_mqtt_dec.unpack_uint16: (0x20020f90): << val:8d8e[0m
2020-07-27T15:26:36.591Z DEBUG modem << [00:02:09.976,837] [0m<dbg> app_cloud.vd_app_cloud_evtHandler: [vd_app_cloud_evtHandler:586] PUBACK packet id: 36238[0m
2020-07-27T15:26:36.594Z DEBUG modem << [00:02:10.090,270] [0m<dbg> net_mqtt.mqtt_input: (0x20020f90): state:0x00000006[0m
2020-07-27T15:26:36.596Z DEBUG modem << [1;31m--- 2 messages dropped ---
2020-07-27T15:26:36.597Z DEBUG modem << [0m[00:02:10.090,332] [0m<dbg> net_mqtt_dec.packet_length_decode: (0x20020f90): length:0x00000002[0m
2020-07-27T15:26:36.662Z DEBUG modem << [00:02:10.090,362] [0m<dbg> net_mqtt_rx.mqtt_handle_packet: (0x20020f90): [CID 0x20021050]: Received MQTT_PKT_TYPE_PUBACK![0m
2020-07-27T15:26:36.727Z DEBUG modem << [00:02:10.090,393] [0m<dbg> net_mqtt_dec.unpack_uint16: (0x20020f90): >> cur:0x20022919, end:0x2002291b[0m
2020-07-27T15:26:36.729Z DEBUG modem << [00:02:10.090,393] [0m<dbg> net_mqtt_dec.unpack_uint16: (0x20020f90): << val:20be[0m
2020-07-27T15:26:36.731Z DEBUG modem << [00:02:10.090,393] [0m<dbg> app_cloud.vd_app_cloud_evtHandler: [vd_app_cloud_evtHandler:586] PUBACK packet id: 8382[0m
2020-07-27T15:26:36.733Z DEBUG modem << [00:02:10.112,487] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: Publishing[0m
2020-07-27T15:26:36.736Z DEBUG modem << [00:02:10.112,487] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: to topic: my/publish/topic length: 174 bytes[0m
2020-07-27T15:26:36.737Z DEBUG modem << [00:02:10.112,518] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: >> Topic size 0x00000010, Data size 0x000000ae[0m
2020-07-27T15:26:36.739Z DEBUG modem << [00:02:10.112,548] [0m<dbg> net_mqtt_enc.pack_utf8_str: (0x200210e4): >> str_size:00000012 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:36.804Z DEBUG modem << [00:02:10.112,548] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0010 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:36.806Z DEBUG modem << [00:02:10.112,579] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0e66 cur:0x20022a5a, end:0x20022b6f[0m
2020-07-27T15:26:36.809Z DEBUG modem << [00:02:10.112,579] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): << msg type:0x32 length:0x000000c2[0m
2020-07-27T15:26:36.813Z DEBUG modem << [00:02:10.112,579] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x00000000, end:0x00000000[0m
2020-07-27T15:26:36.815Z DEBUG modem << [00:02:10.112,609] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): Fixed header length = 03[0m
2020-07-27T15:26:36.816Z DEBUG modem << [00:02:10.112,609] [0m<dbg> net_mqtt_enc.pack_uint8: (0x200210e4): >> val:32 cur:0x20022a45, end:0x20022b6f[0m
2020-07-27T15:26:36.817Z DEBUG modem << [00:02:10.112,609] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x20022a46, end:0x20022b6f[0m
2020-07-27T15:26:36.883Z DEBUG modem << [00:02:10.112,609] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport writing message.[0m
2020-07-27T15:26:36.885Z DEBUG modem << [00:02:10.114,318] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport write complete.[0m
2020-07-27T15:26:36.887Z DEBUG modem << [00:02:10.114,349] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: << result 0x00000000[0m
2020-07-27T15:26:36.891Z DEBUG modem << [00:02:10.114,379] [0m<dbg> app_sensors.vd_app_sensors_thread: Accel sent to cloud[0m
2020-07-27T15:26:41.575Z DEBUG modem << [00:02:15.114,532] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: Publishing[0m
2020-07-27T15:26:41.637Z DEBUG modem << [00:02:15.114,562] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: to topic: my/publish/topic length: 174 bytes[0m
2020-07-27T15:26:41.639Z DEBUG modem << [00:02:15.114,562] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: >> Topic size 0x00000010, Data size 0x000000ae[0m
2020-07-27T15:26:41.641Z DEBUG modem << [00:02:15.114,593] [0m<dbg> net_mqtt_enc.pack_utf8_str: (0x200210e4): >> str_size:00000012 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:41.642Z DEBUG modem << [00:02:15.114,624] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0010 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:41.644Z DEBUG modem << [00:02:15.114,624] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:8ea9 cur:0x20022a5a, end:0x20022b6f[0m
2020-07-27T15:26:41.711Z DEBUG modem << [00:02:15.114,624] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): << msg type:0x32 length:0x000000c2[0m
2020-07-27T15:26:41.775Z DEBUG modem << [00:02:15.114,654] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x00000000, end:0x00000000[0m
2020-07-27T15:26:41.777Z DEBUG modem << [00:02:15.114,654] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): Fixed header length = 03[0m
2020-07-27T15:26:41.779Z DEBUG modem << [00:02:15.114,654] [0m<dbg> net_mqtt_enc.pack_uint8: (0x200210e4): >> val:32 cur:0x20022a45, end:0x20022b6f[0m
2020-07-27T15:26:41.782Z DEBUG modem << [00:02:15.114,685] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x20022a46, end:0x20022b6f[0m
2020-07-27T15:26:41.783Z DEBUG modem << [00:02:15.114,685] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport writing message.[0m
2020-07-27T15:26:41.784Z DEBUG modem << [00:02:15.116,729] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport write complete.[0m
2020-07-27T15:26:41.786Z DEBUG modem << [00:02:15.116,760] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: << result 0x00000000[0m
2020-07-27T15:26:41.852Z DEBUG modem << [00:02:15.116,760] [0m<dbg> app_sensors.vd_app_sensors_thread: Accel sent to cloud[0m
2020-07-27T15:26:46.577Z DEBUG modem << [00:02:20.116,943] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: Publishing[0m
2020-07-27T15:26:46.641Z DEBUG modem << [00:02:20.116,973] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: to topic: my/publish/topic length: 174 bytes[0m
2020-07-27T15:26:46.643Z DEBUG modem << [00:02:20.116,973] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: >> Topic size 0x00000010, Data size 0x000000ae[0m
2020-07-27T15:26:46.645Z DEBUG modem << [00:02:20.117,004] [0m<dbg> net_mqtt_enc.pack_utf8_str: (0x200210e4): >> str_size:00000012 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:46.646Z DEBUG modem << [00:02:20.117,034] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0010 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:46.647Z DEBUG modem << [00:02:20.117,034] [0m<dbg> net_mqtt_enc.pack_uint16: (0x200210e4): >> val:0ef8 cur:0x20022a5a, end:0x20022b6f[0m
2020-07-27T15:26:46.716Z DEBUG modem << [00:02:20.117,034] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): << msg type:0x32 length:0x000000c2[0m
2020-07-27T15:26:46.783Z DEBUG modem << [00:02:20.117,065] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x00000000, end:0x00000000[0m
2020-07-27T15:26:46.785Z DEBUG modem << [00:02:20.117,065] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x200210e4): Fixed header length = 03[0m
2020-07-27T15:26:46.787Z DEBUG modem << [00:02:20.117,065] [0m<dbg> net_mqtt_enc.pack_uint8: (0x200210e4): >> val:32 cur:0x20022a45, end:0x20022b6f[0m
2020-07-27T15:26:46.789Z DEBUG modem << [00:02:20.117,095] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x200210e4): >> length:0x000000c2 cur:0x20022a46, end:0x20022b6f[0m
2020-07-27T15:26:46.791Z DEBUG modem << [00:02:20.117,095] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport writing message.[0m
2020-07-27T15:26:46.792Z DEBUG modem << [00:02:20.118,804] [0m<dbg> net_mqtt.client_write_msg: (0x200210e4): [0x20021050]: Transport write complete.[0m
2020-07-27T15:26:46.793Z DEBUG modem << [00:02:20.118,804] [0m<dbg> net_mqtt.mqtt_publish: (0x200210e4): [CID 0x20021050]:[State 0x06]: << result 0x00000000[0m
2020-07-27T15:26:46.859Z DEBUG modem << [00:02:20.118,835] [0m<dbg> app_sensors.vd_app_sensors_thread: Accel sent to cloud[0m
2020-07-27T15:26:47.423Z DEBUG modem << +CSCON: 0
2020-07-27T15:26:48.588Z DEBUG modem << [00:02:21.126,312] [0m<dbg> nrf9160_gps.gps_thread: GPS has time window to operate[0m
2020-07-27T15:26:48.655Z DEBUG modem << [00:02:21.126,342] [0m<dbg> app_gps.vd_app_gps_handler: GPS_EVT_OPERATION_UNBLOCKED[0m
2020-07-27T15:26:48.657Z DEBUG modem << [00:02:21.126,342] [0m<dbg> nrf9160_gps.print_satellite_stats: Tracking: 0 Using: 0 Unhealthy: 0[0m
2020-07-27T15:26:48.659Z DEBUG modem << [00:02:21.126,373] [0m<dbg> nrf9160_gps.print_satellite_stats: Seconds since last fix 13[0m
2020-07-27T15:26:48.661Z DEBUG modem << [00:02:22.162,109] [0m<dbg> nrf9160_gps.gps_thread: PVT: Position fix[0m
2020-07-27T15:26:48.662Z DEBUG modem << [00:02:22.162,567] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: Publishing[0m
2020-07-27T15:26:48.664Z DEBUG modem << [00:02:22.162,567] [0m<dbg> app_cloud.s32_app_cloud_dataPublish: to topic: my/publish/topic length: 208 bytes[0m
2020-07-27T15:26:48.665Z DEBUG modem << [00:02:22.162,567] [0m<dbg> net_mqtt.mqtt_publish: (0x20020e30): [CID 0x20021050]:[State 0x06]: >> Topic size 0x00000010, Data size 0x000000d0[0m
2020-07-27T15:26:48.733Z DEBUG modem << [00:02:22.162,597] [0m<dbg> net_mqtt_enc.pack_utf8_str: (0x20020e30): >> str_size:00000012 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:48.798Z DEBUG modem << [00:02:22.162,628] [0m<dbg> net_mqtt_enc.pack_uint16: (0x20020e30): >> val:0010 cur:0x20022a48, end:0x20022b6f[0m
2020-07-27T15:26:48.800Z DEBUG modem << [00:02:22.162,628] [0m<dbg> net_mqtt_enc.pack_uint16: (0x20020e30): >> val:14ce cur:0x20022a5a, end:0x20022b6f[0m
2020-07-27T15:26:48.802Z DEBUG modem << [00:02:22.162,628] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x20020e30): << msg type:0x32 length:0x000000e4[0m
2020-07-27T15:26:48.805Z DEBUG modem << [00:02:22.162,658] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x20020e30): >> length:0x000000e4 cur:0x00000000, end:0x00000000[0m
2020-07-27T15:26:48.807Z DEBUG modem << [00:02:22.162,658] [0m<dbg> net_mqtt_enc.mqtt_encode_fixed_header: (0x20020e30): Fixed header length = 03[0m
2020-07-27T15:26:48.808Z DEBUG modem << [00:02:22.162,689] [0m<dbg> net_mqtt_enc.pack_uint8: (0x20020e30): >> val:32 cur:0x20022a45, end:0x20022b6f[0m
2020-07-27T15:26:48.873Z DEBUG modem << [00:02:22.162,689] [0m<dbg> net_mqtt_enc.packet_length_encode: (0x20020e30): >> length:0x000000e4 cur:0x20022a46, end:0x20022b6f[0m
2020-07-27T15:26:48.876Z DEBUG modem << [00:02:22.162,689] [0m<dbg> net_mqtt.client_write_msg: (0x20020e30): [0x20021050]: Transport writing message.[0m
2020-07-27T15:26:56.888Z DEBUG modem >> AT
2020-07-27T15:26:57.888Z ERROR Error: 'AT

Callstack when the bug occurs is below:

I have tried increasing the system heap to no avail. 

Parents
  • Hello,

     

    I am wondering if there is a solution from Nordic regarding this

     There is no fix to this yet as far as I know, but it is being worked on. I will report your case to R&D so they can have a look at it. There will likely be some delay in the progress due to vacations, but most people are coming back now so it shouldn't be that bad.

Reply
  • Hello,

     

    I am wondering if there is a solution from Nordic regarding this

     There is no fix to this yet as far as I know, but it is being worked on. I will report your case to R&D so they can have a look at it. There will likely be some delay in the progress due to vacations, but most people are coming back now so it shouldn't be that bad.

Children
No Data
Related