MQTT publish POLLERR error

Hi,

I'm using nrf9160 MQTT at commands to send real-time data to the cloud. I run into the issue where the MQTT publish may fail (frequency varies around 1 out of 500 publishes). One thread suggests the MQTT fails if the message is too long, however, I don't think this is my case. Any suggestions on how to debug this issue?

Thanks!

Ye-Sheng

The debug message from the modem

This is the working one:

00> [00:18:17.373,504] <inf> slm_at_host: Enter datamode
00> [00:18:17.550,323] <inf> slm_at_host: time limit reached
00> [00:18:17.550,384] <inf> slm_at_host: Raw send 1479
00> [00:18:17.550,415] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: >> Topic size 0x00000015, Data size 0x000005c7
00> [00:18:17.550,476] <dbg> net_mqtt_enc: pack_utf8_str: (0x2001b2b8): >> str_size:00000017 cur:0x2001f163, end:0x2001f35e
00> [00:18:17.550,506] <dbg> net_mqtt_enc: pack_uint16: (0x2001b2b8): >> val:0015 cur:0x2001f163, end:0x2001f35e
00> [00:18:17.550,506] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): << msg type:0x30 length:0x000005de
00> [00:18:17.550,537] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005de cur:(nil), end:(nil)
00> [00:18:17.550,537] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): Fixed header length = 03
00> [00:18:17.550,537] <dbg> net_mqtt_enc: pack_uint8: (0x2001b2b8): >> val:30 cur:0x2001f160, end:0x2001f35e
00> [00:18:17.550,567] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005de cur:0x2001f161, end:0x2001f35e
00> [00:18:17.550,567] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport writing message.
00> [00:18:17.554,443] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport write complete.
00> [00:18:17.554,473] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: << result 0x00000000
00> [00:18:17.554,504] <inf> slm_mqtt: datamode send: 0
00> [00:18:18.542,633] <inf> slm_at_host: time limit reached
00> [00:18:18.542,724] <inf> slm_at_host: Raw send 1478
00> [00:18:18.542,724] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: >> Topic size 0x00000015, Data size 0x000005c6
00> [00:18:18.542,785] <dbg> net_mqtt_enc: pack_utf8_str: (0x2001b2b8): >> str_size:00000017 cur:0x2001f163, end:0x2001f35e
00> [00:18:18.542,816] <dbg> net_mqtt_enc: pack_uint16: (0x2001b2b8): >> val:0015 cur:0x2001f163, end:0x2001f35e
00> [00:18:18.542,816] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): << msg type:0x30 length:0x000005dd
00> [00:18:18.542,846] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005dd cur:(nil), end:(nil)
00> [00:18:18.542,846] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): Fixed header length = 03
00> [00:18:18.542,846] <dbg> net_mqtt_enc: pack_uint8: (0x2001b2b8): >> val:30 cur:0x2001f160, end:0x2001f35e
00> [00:18:18.542,877] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005dd cur:0x2001f161, end:0x2001f35e
00> [00:18:18.542,877] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport writing message.
00> [00:18:18.546,844] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport write complete.
00> [00:18:18.546,875] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: << result 0x00000000
00> [00:18:18.546,875] <inf> slm_mqtt: datamode send: 0
00> [00:18:18.546,905] <inf> slm_at_host: datamode off pending
00> [00:18:18.557,098] <inf> slm_at_host: Exit datamode

This is the failed one.
00> [00:18:19.397,705] <inf> slm_at_host: Enter datamode
00> [00:18:19.573,089] <inf> slm_at_host: time limit reached
00> [00:18:19.573,150] <inf> slm_at_host: Raw send 1466
00> [00:18:19.573,150] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: >> Topic size 0x00000015, Data size 0x000005ba
00> [00:18:19.573,272] <dbg> net_mqtt_enc: pack_utf8_str: (0x2001b2b8): >> str_size:00000017 cur:0x2001f163, end:0x2001f35e
00> [00:18:19.573,303] <dbg> net_mqtt_enc: pack_uint16: (0x2001b2b8): >> val:0015 cur:0x2001f163, end:0x2001f35e
00> [00:18:19.573,303] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): << msg type:0x30 length:0x000005d1
00> [00:18:19.573,333] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005d1 cur:(nil), end:(nil)
00> [00:18:19.573,333] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): Fixed header length = 03
00> [00:18:19.573,333] <dbg> net_mqtt_enc: pack_uint8: (0x2001b2b8): >> val:30 cur:0x2001f160, end:0x2001f35e
00> [00:18:19.573,364] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005d1 cur:0x2001f161, end:0x2001f35e
00> [00:18:19.573,364] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport writing message.
00> [00:18:19.577,117] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport write complete.
00> [00:18:19.577,148] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: << result 0x00000000
00> [00:18:19.577,148] <inf> slm_mqtt: datamode send: 0
00> [00:18:20.137,756] <err> slm_mqtt: POLLERR
00> [00:18:20.137,786] <inf> net_mqtt_sock_tls: Closing socket 0
00> [00:18:20.139,099] <inf> slm_mqtt: MQTT thread terminated
00> [00:18:20.176,391] <inf> slm_at_host: time limit reached
00> [00:18:20.176,452] <inf> slm_at_host: Raw send 11
00> [00:18:20.176,483] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x00]: >> Topic size 0x00000015, Data size 0x0000000b
00> [00:18:20.176,544] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x00]: << result 0xffffff80
00> [00:18:20.176,574] <inf> slm_mqtt: datamode send: -128
00> [00:18:20.176,574] <wrn> slm_at_host: Raw send failed, 11 dropped
00> [00:18:20.176,605] <inf> slm_at_host: datamode off pending
00> [00:18:20.186,767] <inf> slm_at_host: Exit datamode

Parents
  • Hi Ye-Sheng,

    Thanks for sharing the modem trace. I will try to find someone to review it.

    Another finding is that the MQTT codes in SLM are using the absolute maximum value so I don’t know if it could trigger some undesired behavior in some edge cases. Our development team has got notified. I would suggest trying with a shorter keep-alive interval to see if that changes anything. So in slm_at_mqtt.c changing err = poll(&fds, 1, mqtt_keepalive_time_left(&client)); to e.g. err = poll(&fds, 1, mqtt_keepalive_time_left(&client) / 2);this is just a wild guess worth trying.

    I am still curious about whether this issue comes from the connection/server side, looking forward to seeing some test results from your side.

    Best regards,

    Charlie

  • Hi Charlie,

    Thanks for the suggestion.

    I couldn't tell the difference with the change, it still fails occasionally. We have other devices sending data to the server via WiFi, and we don't see this issue.

    another piece of information is the MQTT return code was -1, 113 (ECONNABORTED). Any other things I could try?

    Thanks!

Reply Children
No Data
Related