MQTT publish POLLERR error

Hi,

I'm using nrf9160 MQTT at commands to send real-time data to the cloud. I run into the issue where the MQTT publish may fail (frequency varies around 1 out of 500 publishes). One thread suggests the MQTT fails if the message is too long, however, I don't think this is my case. Any suggestions on how to debug this issue?

Thanks!

Ye-Sheng

The debug message from the modem

This is the working one:

00> [00:18:17.373,504] <inf> slm_at_host: Enter datamode
00> [00:18:17.550,323] <inf> slm_at_host: time limit reached
00> [00:18:17.550,384] <inf> slm_at_host: Raw send 1479
00> [00:18:17.550,415] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: >> Topic size 0x00000015, Data size 0x000005c7
00> [00:18:17.550,476] <dbg> net_mqtt_enc: pack_utf8_str: (0x2001b2b8): >> str_size:00000017 cur:0x2001f163, end:0x2001f35e
00> [00:18:17.550,506] <dbg> net_mqtt_enc: pack_uint16: (0x2001b2b8): >> val:0015 cur:0x2001f163, end:0x2001f35e
00> [00:18:17.550,506] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): << msg type:0x30 length:0x000005de
00> [00:18:17.550,537] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005de cur:(nil), end:(nil)
00> [00:18:17.550,537] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): Fixed header length = 03
00> [00:18:17.550,537] <dbg> net_mqtt_enc: pack_uint8: (0x2001b2b8): >> val:30 cur:0x2001f160, end:0x2001f35e
00> [00:18:17.550,567] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005de cur:0x2001f161, end:0x2001f35e
00> [00:18:17.550,567] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport writing message.
00> [00:18:17.554,443] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport write complete.
00> [00:18:17.554,473] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: << result 0x00000000
00> [00:18:17.554,504] <inf> slm_mqtt: datamode send: 0
00> [00:18:18.542,633] <inf> slm_at_host: time limit reached
00> [00:18:18.542,724] <inf> slm_at_host: Raw send 1478
00> [00:18:18.542,724] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: >> Topic size 0x00000015, Data size 0x000005c6
00> [00:18:18.542,785] <dbg> net_mqtt_enc: pack_utf8_str: (0x2001b2b8): >> str_size:00000017 cur:0x2001f163, end:0x2001f35e
00> [00:18:18.542,816] <dbg> net_mqtt_enc: pack_uint16: (0x2001b2b8): >> val:0015 cur:0x2001f163, end:0x2001f35e
00> [00:18:18.542,816] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): << msg type:0x30 length:0x000005dd
00> [00:18:18.542,846] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005dd cur:(nil), end:(nil)
00> [00:18:18.542,846] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): Fixed header length = 03
00> [00:18:18.542,846] <dbg> net_mqtt_enc: pack_uint8: (0x2001b2b8): >> val:30 cur:0x2001f160, end:0x2001f35e
00> [00:18:18.542,877] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005dd cur:0x2001f161, end:0x2001f35e
00> [00:18:18.542,877] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport writing message.
00> [00:18:18.546,844] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport write complete.
00> [00:18:18.546,875] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: << result 0x00000000
00> [00:18:18.546,875] <inf> slm_mqtt: datamode send: 0
00> [00:18:18.546,905] <inf> slm_at_host: datamode off pending
00> [00:18:18.557,098] <inf> slm_at_host: Exit datamode

This is the failed one.
00> [00:18:19.397,705] <inf> slm_at_host: Enter datamode
00> [00:18:19.573,089] <inf> slm_at_host: time limit reached
00> [00:18:19.573,150] <inf> slm_at_host: Raw send 1466
00> [00:18:19.573,150] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: >> Topic size 0x00000015, Data size 0x000005ba
00> [00:18:19.573,272] <dbg> net_mqtt_enc: pack_utf8_str: (0x2001b2b8): >> str_size:00000017 cur:0x2001f163, end:0x2001f35e
00> [00:18:19.573,303] <dbg> net_mqtt_enc: pack_uint16: (0x2001b2b8): >> val:0015 cur:0x2001f163, end:0x2001f35e
00> [00:18:19.573,303] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): << msg type:0x30 length:0x000005d1
00> [00:18:19.573,333] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005d1 cur:(nil), end:(nil)
00> [00:18:19.573,333] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x2001b2b8): Fixed header length = 03
00> [00:18:19.573,333] <dbg> net_mqtt_enc: pack_uint8: (0x2001b2b8): >> val:30 cur:0x2001f160, end:0x2001f35e
00> [00:18:19.573,364] <dbg> net_mqtt_enc: packet_length_encode: (0x2001b2b8): >> length:0x000005d1 cur:0x2001f161, end:0x2001f35e
00> [00:18:19.573,364] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport writing message.
00> [00:18:19.577,117] <dbg> net_mqtt: client_write_msg: (0x2001b2b8): [0x2001b5cc]: Transport write complete.
00> [00:18:19.577,148] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x06]: << result 0x00000000
00> [00:18:19.577,148] <inf> slm_mqtt: datamode send: 0
00> [00:18:20.137,756] <err> slm_mqtt: POLLERR
00> [00:18:20.137,786] <inf> net_mqtt_sock_tls: Closing socket 0
00> [00:18:20.139,099] <inf> slm_mqtt: MQTT thread terminated
00> [00:18:20.176,391] <inf> slm_at_host: time limit reached
00> [00:18:20.176,452] <inf> slm_at_host: Raw send 11
00> [00:18:20.176,483] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x00]: >> Topic size 0x00000015, Data size 0x0000000b
00> [00:18:20.176,544] <dbg> net_mqtt: mqtt_publish: (0x2001b2b8): [CID 0x2001b5cc]:[State 0x00]: << result 0xffffff80
00> [00:18:20.176,574] <inf> slm_mqtt: datamode send: -128
00> [00:18:20.176,574] <wrn> slm_at_host: Raw send failed, 11 dropped
00> [00:18:20.176,605] <inf> slm_at_host: datamode off pending
00> [00:18:20.186,767] <inf> slm_at_host: Exit datamode

Parents
  • Hi Ye-Sheng,

    Thanks for the feedback. We have reviewed your modem trace.

    Unfortunately, the trace only shows a working TLS connection, without any breaks, from the error case log, it is clear that the socket is closed, so this would be visible in the modem trace file if it happened during that time.

    In some implementations, POLLERR usually refers to TCP reset. I am not sure if that is case with our modem, but since the error comes in a situation that we have sent the MQTT package and are idle, I would suspect that the server closes the connection with TCP reset.

    There are multiple reasons why this could take place and I would recommend checking the logs in the server side.

    However, after the failure, trace shows a curious length for data send:

    00> [00:18:20.176,452] <inf> slm_at_host: Raw send 11

    If that is not intended, the reason for failure could be that the whole MQTT package was not in the data_buf when the timing was triggered and the remote end is terminating the connection due to receiving an incomplete Pub-message. Is there a possibility that the application/script that takes care of the data sending would send the data in parts?

    Another possibility would be an issue in UART, or SLM UART implementation. SLM UART implementation was (very) recently refactored, so you can give the SLM main branch a go, and see if that would help.

    You can also set CONFIG_NRF_MODEM_LOG=y to get libmodem’s logs to get more clues why poll() returns POLLERR.

    Best regards,

    Charlie

Reply
  • Hi Ye-Sheng,

    Thanks for the feedback. We have reviewed your modem trace.

    Unfortunately, the trace only shows a working TLS connection, without any breaks, from the error case log, it is clear that the socket is closed, so this would be visible in the modem trace file if it happened during that time.

    In some implementations, POLLERR usually refers to TCP reset. I am not sure if that is case with our modem, but since the error comes in a situation that we have sent the MQTT package and are idle, I would suspect that the server closes the connection with TCP reset.

    There are multiple reasons why this could take place and I would recommend checking the logs in the server side.

    However, after the failure, trace shows a curious length for data send:

    00> [00:18:20.176,452] <inf> slm_at_host: Raw send 11

    If that is not intended, the reason for failure could be that the whole MQTT package was not in the data_buf when the timing was triggered and the remote end is terminating the connection due to receiving an incomplete Pub-message. Is there a possibility that the application/script that takes care of the data sending would send the data in parts?

    Another possibility would be an issue in UART, or SLM UART implementation. SLM UART implementation was (very) recently refactored, so you can give the SLM main branch a go, and see if that would help.

    You can also set CONFIG_NRF_MODEM_LOG=y to get libmodem’s logs to get more clues why poll() returns POLLERR.

    Best regards,

    Charlie

Children
  • Hi Charlie,

    Thanks for the suggestion, I'll dig into the log on the server side. In the meantime, I happened to capture one failure with modem log enable.

    00> [00:02:43.936,004] <inf> nrf_modem: Close reason RPC_IP_ERR_NET_DOWN

    I'm not sure how to interpret this message. Does it suggest that the LTE was disconnected by the tower? Also, you're right, the 11 byte was not intended. Our server shouldn't reject any incomplete message. But I'll take a look.

    Thanks,

    Ye-Sheng

    --

    00> [00:02:43.348,815] <inf> slm_at_host: Enter datamode
    00> [00:02:43.521,392] <inf> slm_at_host: time limit reached
    00> [00:02:43.521,453] <inf> slm_at_host: Raw send 1435
    00> [00:02:43.521,484] <dbg> net_mqtt: mqtt_publish: (0x20014b38): [CID 0x20014e38]:[State 0x06]: >> Topic size 0x00000015, Data size 0x0000059b
    00> [00:02:43.521,575] <dbg> net_mqtt_enc: pack_utf8_str: (0x20014b38): >> str_size:00000017 cur:0x20019920, end:0x20019b1b
    00> [00:02:43.521,606] <dbg> net_mqtt_enc: pack_uint16: (0x20014b38): >> val:0015 cur:0x20019920, end:0x20019b1b
    00> [00:02:43.521,606] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x20014b38): << msg type:0x30 length:0x000005b2
    00> [00:02:43.521,636] <dbg> net_mqtt_enc: packet_length_encode: (0x20014b38): >> length:0x000005b2 cur:(nil), end:(nil)
    00> [00:02:43.521,636] <dbg> net_mqtt_enc: mqtt_encode_fixed_header: (0x20014b38): Fixed header length = 03
    00> [00:02:43.521,667] <dbg> net_mqtt_enc: pack_uint8: (0x20014b38): >> val:30 cur:0x2001991d, end:0x20019b1b
    00> [00:02:43.521,667] <dbg> net_mqtt_enc: packet_length_encode: (0x20014b38): >> length:0x000005b2 cur:0x2001991e, end:0x20019b1b
    00> [00:02:43.521,697] <dbg> net_mqtt: client_write_msg: (0x20014b38): [0x20014e38]: Transport writing message.
    00> [00:02:43.521,759] <inf> nrf_modem: send() fd 0x0, len 26, blocking
    00> [00:02:43.522,430] <inf> nrf_modem: RPC_IP_SEND_RES fd 0x0, result RPC_IP_ERR_OK
    00> [00:02:43.522,552] <inf> nrf_modem: send() fd 0x0, 26 bytes sent
    00> [00:02:43.522,583] <inf> nrf_modem: send() fd 0x0, len 1435, blocking
    00> [00:02:43.525,177] <inf> nrf_modem: RPC_IP_SEND_RES fd 0x0, result RPC_IP_ERR_OK
    00> [00:02:43.525,329] <inf> nrf_modem: send() fd 0x0, 1435 bytes sent
    00> [00:02:43.525,360] <dbg> net_mqtt: client_write_msg: (0x20014b38): [0x20014e38]: Transport write complete.
    00> [00:02:43.525,390] <dbg> net_mqtt: mqtt_publish: (0x20014b38): [CID 0x20014e38]:[State 0x06]: << result 0x00000000
    00> [00:02:43.525,390] <inf> slm_mqtt: datamode send: 0
    00> [00:02:43.935,974] <inf> nrf_modem: RPC_IP_CLOSED_NTF fd 0x0
    00> [00:02:43.936,004] <inf> nrf_modem: Close reason RPC_IP_ERR_NET_DOWN
    00> [00:02:43.936,096] <err> slm_mqtt: POLLERR
    00> [00:02:43.936,096] <inf> net_mqtt_sock_tls: Closing socket 0
    00> [00:02:43.936,126] <inf> nrf_modem: close() fd 0x0
    00> [00:02:43.937,194] <inf> nrf_modem: RPC_IP_CLOSE_RES fd 0x0, result RPC_IP_ERR_OK
    00> [00:02:43.937,377] <inf> slm_mqtt: MQTT thread terminated
    00> [00:02:43.974,700] <inf> slm_at_host: time limit reached
    00> [00:02:43.974,761] <inf> slm_at_host: Raw send 11
    00> [00:02:43.974,792] <dbg> net_mqtt: mqtt_publish: (0x20014b38): [CID 0x20014e38]:[State 0x00]: >> Topic size 0x00000015, Data size 0x0000000b
    00> [00:02:43.974,884] <dbg> net_mqtt: mqtt_publish: (0x20014b38): [CID 0x20014e38]:[State 0x00]: << result 0xffffff80
    00> [00:02:43.974,914] <inf> slm_mqtt: datamode send: -128
    00> [00:02:43.974,945] <wrn> slm_at_host: Raw send failed, 11 dropped
    00> [00:02:43.975,585] <inf> slm_at_host: Exit datamode

Related