MQTT TLS connect blocking

Hi

I am currently working on a custom nrf9160 board and I have a random bug that appears to be in this function mqtt_client_connect_tls.

For 3min, the board consumption is 4mA and then the mqtt returns -116 error (timeout).

I have found in the devzone that is a known bug that should have been corrected in modem version 1.3.0.

 Blocking socket calls in MQTT library 

I am sure that only one thread is accessing the socket.

I have also tested increase the config like said. It doesn't change the problem.

CONFIG_NRF_MODEM_LIB_HEAP_SIZE=4096
CONFIG_NRF_MODEM_LIB_SENDMSG_BUF_SIZE=2048

09:46:17.528) (COM2) (>>) [16:42:30.076,232] <inf> MQTT_STANDARD_MODULE: TLS enabled<CR><LF>
(09:46:17.528) (COM2) (>>) [16:42:30.076,690] <dbg> net_mqtt_sock_tls: (): Created socket 0<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.424,224] <dbg> lte_lc: +CSCON notification<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.424,346] <dbg> lte_lc_helpers: Dispatching events:<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.424,377] <dbg> lte_lc_helpers:  - handler=0x0002ED49<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.424,377] <inf> module_modem: MODEM notification: RRC_UPDATE<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.424,377] <inf> module_modem: RRC mode: Idle<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.424,407] <dbg> lte_lc_helpers: Done<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.818,756] <dbg> lte_lc: +CEREG notification: +CEREG: 5,"C3D5","00136CCF",9,,,"00001010","11000001"<CR><LF>
(09:47:04.943) (COM2) (>>) <CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.818,969] <dbg> lte_lc_helpers: Network registration status: 5<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.819,000] <dbg> lte_lc_helpers: LTE mode: 9<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.819,061] <dbg> lte_lc_helpers: Dispatching events:<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.819,061] <dbg> lte_lc_helpers:  - handler=0x0002ED49<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.819,061] <inf> module_modem: MODEM notification: CELL_UPDATE<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.819,091] <inf> module_modem: LTE cell changed: Cell ID: 1273039, Tracking area: 50133<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.819,091] <dbg> lte_lc_helpers: Done<CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.819,824] <dbg> lte_lc_helpers: TAU: 1152000 sec, active time: 20 sec<LF><CR><LF>
(09:47:04.943) (COM2) (>>) [16:43:17.819,824] <dbg> lte_lc: TAU: 1152000 sec, active time: 20 sec<LF><CR><LF>
(09:47:24.895) (COM2) (>>) [16:43:37.425,964] <dbg> lte_lc: %XMODEMSLEEP notification<CR><LF>
(09:47:24.895) (COM2) (>>) [16:43:37.426,116] <dbg> lte_lc_helpers: Dispatching events:<CR><LF>
(09:47:24.895) (COM2) (>>) [16:43:37.426,116] <dbg> lte_lc_helpers:  - handler=0x0002ED49<CR><LF>
(09:47:24.895) (COM2) (>>) [16:43:37.426,116] <inf> module_modem: MODEM notification: MODEM_SLEEP_ENTER<CR><LF>
(09:47:24.895) (COM2) (>>) [16:43:37.426,147] <dbg> lte_lc_helpers: Done<CR><LF>
(09:49:03.930) (COM2) (>>) [16:45:16.989,501] <err> MQTT_STANDARD_MODULE: mqtt_connect, error: -116<CR><LF>
(09:49:04.060) (COM2) (>>) [16:45:16.989,532] <wrn> MQTT_STANDARD_MODULE: connection_poll_start failed, error: -116<CR><LF>
(09:49:04.060) (COM2) (>>) [16:45:16.989,532] <inf> MODULE_MQTT: standard_cloud_connect -116<CR><LF>
(09:49:04.060) (COM2) (>>) [16:45:16.989,562] <err> MODULE_MQTT: mqtt_connect -116<CR><LF>

I attach the log that I can have.

Do you know if there is a correction available ?

Thank you in advance

Antoine

(sdk 1.9.1, modem FW 1.3.2)

Parents Reply
  • Using TCP over NB-IoT is not recommended, you should rather use UDP if possible.

    "There are three TLS connections in the log. 1st and 3rd TLS connection is successfully established. The 2nd TLS connection fails to timeout.

    The same problem is visible in all three TLS connections. The NB-IoT connection is so slow that server starts to send TCP re-transmissions during the TLS handshake.

    The Server Hello message contains TLS handshake protocol messages: Server Hello, Certificate, Server Key Exchange, Certificate Request, and Server Hello Done. It's length is 3583 bytes and it is split to six TCP segments. In all TCP connections the whole content of Server Hello is re-transmitted in TCP level. In the failing connection the TCP re-transmission is not completed in time and timeout ends the TLS handshake.

    I am afraid that the only option here is to increase the TCP re-transmission interval in  server side. Constant re-transmissions are lowering the throughput and preventing the TLS handshake to complete in time."

Children
Related