This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Socket send function hangs

Hi,

I have a custom board with nrf9160 and I'm using it to send messages to the AWS server via mqtt. I managed to reproduce a bug where the TLS socket send function hangs (similar to -https://devzone.nordicsemi.com/f/nordic-q-a/62249/hang-in-sendmsg-on-the-nrf9160). I was using v1.3 of ncs.

Steps to reproduce - 

  • Connect to LTE netwrok. 
  • Connect to AWS via MQTT
  • When a successful connection is established, send a disconnect message
  • When successfully disconnected, reconnect and keep repeating this loop
  • On the 21st connection/disconnection cycle the function gets stuck in the mqtt_client_tls_write function while calling the 
    send(client->transport.tls.sock, data + offset, datalen - offset, 0);
    function

I switched to the latest master branch of ncs and changed the send function to send(client->transport.tls.sock, data + offset, datalen - offset, MSG_DONTWAIT) Now the send function doesn't hang but returns a value of -1 and the errno is -EAGAIN when I follow the steps to reproduce this bug.

How do I handle the socket timeout gracefully? After this timeout, if I try to reconnect to AWS I get 

aws_iot: getaddrinfo, error -10
aws_iot: client_broker_init, error: -10

and I have to restart the device to get it working again

I also noticed an error case which doesn't seem to be handled in aws_iot.c. When the socket send function returns -1, then in file mqtt.c, the function client_connect calls client_disconnect which in turn calls disconnect_event_notify function. This notifies the aws_iot client with 

evt.type = MQTT_EVT_CONNACK;
evt.result = -ECONNREFUSED;
which is not handled properly in aws_iot.c. Handling this properly still causes the same error I mentioned before though.
Any help will be appreciated. Thanks in advance!
Nikil
Parents
  • Hi,

    We recently found an issue that corresponds to your description when connect/disconnect looping with a TLS connection:

    On the 21st connection/disconnection cycle the function gets stuck

    We are currently testing and implementing this fix into a newer modem fw version (v1.2.2). At this time, with the currently available modem fw v1.2.0 or older), we unfortunately do not have a good workaround (a reset shall help, but that is not a good fix).

    I must apologize for the inconvenience this has caused.

     

    Kind regards,

    Håkon

Reply
  • Hi,

    We recently found an issue that corresponds to your description when connect/disconnect looping with a TLS connection:

    On the 21st connection/disconnection cycle the function gets stuck

    We are currently testing and implementing this fix into a newer modem fw version (v1.2.2). At this time, with the currently available modem fw v1.2.0 or older), we unfortunately do not have a good workaround (a reset shall help, but that is not a good fix).

    I must apologize for the inconvenience this has caused.

     

    Kind regards,

    Håkon

Children
No Data
Related