This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nRF9160 making mqtt_client_tls_connect() non-blocking

Hi,

What cellular connectivity, the provider's DNS, and backend cloud services, invariably connecting over MQTT is transient and the internal blocking timeout of mqtt_client_tls_connect is inordinately long which can cause significant loss of battery life. We're perfectly fine modifying mqtt_client_tls_connect in mqtt_transport_socket_tls.c to make it a non-blocking socket connect() with our own timeout value but I'm observing the following:

  • When calling fcntl(client->transport.tls.sock, F_SETFL, O_NONBLOCK); the flags parameter get unpacked in nrf91_socket_offload_fcntl, but flags = va_arg(args, int) is returning 0 instead of 0x4000 (O_NONBLOCK) even though I can see 0x4000 is being passed down before it becomes a va-arg.   

  • To eliminate va_arg as an issue we brought up the direct call to nrf_fcntl with NRF_O_NONBLOCK into mqtt_client_tls_connect and verified it is being set properly by calling nrf_fcntl(.. NRF_F_GETFL..) which returns NRF_O_NONBLOCK. However the subsequent socket connect(...) behaves as still blocking and doesn't return EINPROGRESS.

Full code below, any ideas?

Thanks!

// Set to non blocking
int flags = nrf_fcntl(client->transport.tls.sock, NRF_F_GETFL, 0);
ret = nrf_fcntl(client->transport.tls.sock, NRF_F_SETFL, flags | NRF_O_NONBLOCK);
int rflags = nrf_fcntl(client->transport.tls.sock, NRF_F_GETFL, 0);

printk(">>>>>>>> nrf_fcntl verified value set to %d\n", rflags);

ret = connect(client->transport.tls.sock, client->broker, peer_addr_size);
if (ret == EINPROGRESS) {
  printk(">>>>>>>> connect EINPROGRESS\n");

  fd_set wait_set;
  struct timeval tv;

  // make file descriptor set with socket
  FD_ZERO(&wait_set);
  FD_SET(client->transport.tls.sock, &wait_set);

  // wait for socket to be writable; return after given timeout
  tv.tv_sec = 30; // 30 seconds
  tv.tv_usec = 0;
  ret = select(client->transport.tls.sock + 1, NULL, &wait_set, NULL, &tv);
  } else if (ret < 0) {
  goto error;
}

ret = nrf_fcntl(client->transport.tls.sock, NRF_F_SETFL, flags); // restore to blocking socket calls

     

Parents
  • Hi,

    Can you provide som more information on the matter, after asking R&D it would be beneficial if you can share your reason for making connect() non-blocking.  As it is now we have not testes or measured current draw when connect() call is block or non-blocking. The current draw should not be affected by the connect() call being blocking or non-blocking since a thread that is calling connect() should be put to sleep then woken regardless if connect() call succeeds or fails. 

    Have you measured hige current dras while connect() is blocking ? 

    Regards,
    Jonathan

  • Yes the biggest issue with waiting for a blocking connect is that it can take literally tens of minutes or more to return with a failure, this is unacceptable when the rest of the firmware code is waiting to process further such as persisting the failed cloud upload to storage, or timing clashes for processing the next sensor sample wakeup cycle. The entire device is rendered unusable while the connect is blocking. While waiting the CPU is not in a PSM sleep state and the battery usage for the device will rapidly increase. 

    If the socket connect doesn't return within 30s or so its pretty much a guarantee its going to fail anyways, so I think its a major flaw in the system if either it can't be made a non-blocking connect, or the timeout in the modem firmware can't be significantly shortened. 

    Thanks!

  • Hi,

    Feedback from R&D:

    Mixing nrf_* and non-prefixed socket calls doesn't work, because the internal socket IDs end up being different, even though the same fd is used on the application level.
     - In this case what's happened is that it was not the MQTT socket that was set as non-blocking, but another socket in the system.
     - Removing the nrf-prefix should fixed the issue. Note that fcntl.h must be included instead of nrf_socket.h for that to work.

    When checking for EINPROGRESS, they compare against the return value from connect(). Return value from connect can only be 0 on success, or -1 on failure. If it's -1, errno must be checked for the actual error code. So the comparison should be along the lines of if ((ret == -1) && (errno == EINPROGRESS))

    When addressing the points above the code seems to work as expected, with the MQTT socket set to non-blocking, connect() returns -1 and errno is EINPROGRESS.  

    Still some areas that need sto be addressed:

    • Check return value of select() and if it returned due to timeout or event
    • If select() returned because of event, check if connection was established
      • Normally that would be done using getpeername(), but that's not implemented in Zephyr/modem lib as far as I can see
      • I think socket option SO_ERROR can be read out and checked. It should report an error if connection failed, but I have not tested this. Note that it will not report an error if select() returned due to timeout, so that needs to be checked first.

    • If select() timed out, the modem is still trying to connect to the server, so that needs to be terminated. close() should do it. 


    Regards,
    Jonathan

Reply
  • Hi,

    Feedback from R&D:

    Mixing nrf_* and non-prefixed socket calls doesn't work, because the internal socket IDs end up being different, even though the same fd is used on the application level.
     - In this case what's happened is that it was not the MQTT socket that was set as non-blocking, but another socket in the system.
     - Removing the nrf-prefix should fixed the issue. Note that fcntl.h must be included instead of nrf_socket.h for that to work.

    When checking for EINPROGRESS, they compare against the return value from connect(). Return value from connect can only be 0 on success, or -1 on failure. If it's -1, errno must be checked for the actual error code. So the comparison should be along the lines of if ((ret == -1) && (errno == EINPROGRESS))

    When addressing the points above the code seems to work as expected, with the MQTT socket set to non-blocking, connect() returns -1 and errno is EINPROGRESS.  

    Still some areas that need sto be addressed:

    • Check return value of select() and if it returned due to timeout or event
    • If select() returned because of event, check if connection was established
      • Normally that would be done using getpeername(), but that's not implemented in Zephyr/modem lib as far as I can see
      • I think socket option SO_ERROR can be read out and checked. It should report an error if connection failed, but I have not tested this. Note that it will not report an error if select() returned due to timeout, so that needs to be checked first.

    • If select() timed out, the modem is still trying to connect to the server, so that needs to be terminated. close() should do it. 


    Regards,
    Jonathan

Children
No Data
Related