This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF modem 1.1.1 firmware, 1.2 SDK. Re-using sockets causes hang in nrf_connect.

I'm porting a working app from SDK 1.1 to 1.2, and have noted that when I make an mqtt connection close it, and re-open (to a different mqtt server) that the call to nrf_connect hangs indefinitely.

The socket number is the same, which makes me think that when the first connection closes, it leaves something in a bad state.

The two separate mqtt connections work fine in isolation from each other, just not when the connections are made sequentially. (open, close, open)

Also, I've noted that when I modify the code to make both connections one after the other, forcing the use of another "new" socket for the second connection, things work fine.

This worked fine with modem firmware 1.1.1 and SDK 1.1, so it seems like it might be a new SDK 1.2 bug.

Parents
  • Hi.

    One of the changes between NCS v1.1.0 and NCS v1.2.0 is that the MQTT library now closes the socket in mqtt_disconnect, instead of after another call to mqtt_live.

    How do you "re-use" the socket, and how do you "force" it to use another new socket?

    Also, if you could capture a modem trace, that would help me track what happens to the sockets on the modem side.

    You can learn how to capture a modem trace in this guide: https://devzone.nordicsemi.com/nordic/nordic-blog/b/blog/posts/how-to-get-modem-trace-using-trace-collector-in-nrf-connect

    Best regards,

    Didrik

  • Didrik,

    I've attached a trace file. I had some trouble getting a trace in our target hardware and wound up using the DK to do it.

    What I mean by "re-using" a socket is that the same socket number would get used for both connections. You can note this by watching the net_mqtt log output.

    I've adapted the aws_fota stuff from the aws_fota sample. To demo this problem, I simply ran aws_fota_check twice. The first one works, the second hangs early on, when it trys to call nrf_connect. Our normal application has us connecting to AWS for a fota check at startup, then disconnecting and connecting to Azure via mqtt.

    To demo this I'm simply doing what the aws_fota example does, except that after a timeout, I'm closing the mqtt connection.

    You can see a trimmed log below, showing the issue.



    AWS fota check starting
    
    
    
    
    [00:00:04.098,114] 
    
    
    [0m<dbg> aws_fota_wrapper.aws_fota_check: MQTT AWS FOTA check starting, version: v1.0.14
    
    
    [0m
    
    
    
    
    [00:00:04.208,312] 
    
    
    [0m<dbg> aws_fota_wrapper.broker_init: IPv4 Address 35.153.241.198
    
    
    [0m
    
    
    
    
    [00:00:04.216,278] 
    
    
    [0m<dbg> aws_fota_wrapper.client_init: client_id: krmp352656100158107
    
    
    [0m
    
    
    
    
    [00:00:04.235,260] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_connect: (0x20025940): Created socket 2
    
    
    [0m
    
    
    
    
    %CESQ: 47,2,12,1
    
    
    
    
    rsrp value = 0
    
    
    
    
    %CESQ: 52,2,12,1
    
    
    
    
    rsrp value = 47
    
    
    
    
    [00:00:08.480,560] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_connect: (0x20025940): Connect completed
    
    

    ... lines elided - everything in the elided lines works as expected....

    [0m<dbg> aws_fota.on_publish_evt: Received topic: $aws/things/krmp352656100158107/jobs/$next/get/accepted
    
    
    [0m
    
    
    [00:00:10.070,922] 
    
    
    [0m<inf> aws_fota: Checking for an available job
    
    
    [0m
    
    
    
    
    [00:00:10.077,667] 
    
    
    [0m<dbg> aws_fota.get_job_execution: Got only one field: {"clientToken":"","timestamp":1582816345}
    
    
    [0m
    
    
    
    
    [00:00:10.088,470] 
    
    
    [0m<inf> aws_fota: No queued jobs for this device
    
    
    [0m
    
    
    
    
    [00:00:38.681,243] 
    
    
    [0m<dbg> aws_fota_wrapper.aws_fota_check: Disconnecting MQTT client...
    
    
    [0m
    
    
    
    
    [00:00:38.689,727] 
    
    
    [0m<dbg> net_mqtt.client_write: (0x20025940): [0x200240a0]: Transport writing 2 bytes.
    
    
    [0m
    
    
    
    
    [00:00:38.708,587] 
    
    
    [0m<dbg> net_mqtt.client_write: (0x20025940): [0x200240a0]: Transport write complete.
    
    
    [0m
    
    
    
    
    [00:00:38.718,383] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_disconnect: (0x20025940): Closing socket 2
    
    
    [0m
    
    
    
    
    [00:00:38.729,370] 
    
    
    [1;31m<err> aws_fota_wrapper: [mqtt_evt_handler:183] MQTT client disconnected 0


    ... fota check run for a second time. Same code path, not that it hangs after creating the socket.....

    [0m
    
    
    
    
    AWS fota check finished
    
    
    
    
    AWS fota check 2 starting
    
    
    
    
    [00:00:38.743,011] 
    
    
    [0m<dbg> aws_fota_wrapper.aws_fota_check: MQTT AWS FOTA check starting, version: v1.0.14
    
    
    [0m
    
    
    [00:00:38.753,326] 
    
    
    [0m<dbg> aws_fota_wrapper.broker_init: IPv4 Address 35.153.241.198
    
    
    [0m
    
    
    
    
    [00:00:38.761,474] 
    
    
    [0m<dbg> aws_fota_wrapper.client_init: client_id: krmp352656100158107
    
    
    [0m
    
    
    
    
    [00:00:38.778,503] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_connect: (0x20025940): Created socket 2
    
    
    [0m

    I've attached a modem trace of this scenario as well.

     trace-2020-02-27T15-06-54.982Z.bin

Reply
  • Didrik,

    I've attached a trace file. I had some trouble getting a trace in our target hardware and wound up using the DK to do it.

    What I mean by "re-using" a socket is that the same socket number would get used for both connections. You can note this by watching the net_mqtt log output.

    I've adapted the aws_fota stuff from the aws_fota sample. To demo this problem, I simply ran aws_fota_check twice. The first one works, the second hangs early on, when it trys to call nrf_connect. Our normal application has us connecting to AWS for a fota check at startup, then disconnecting and connecting to Azure via mqtt.

    To demo this I'm simply doing what the aws_fota example does, except that after a timeout, I'm closing the mqtt connection.

    You can see a trimmed log below, showing the issue.



    AWS fota check starting
    
    
    
    
    [00:00:04.098,114] 
    
    
    [0m<dbg> aws_fota_wrapper.aws_fota_check: MQTT AWS FOTA check starting, version: v1.0.14
    
    
    [0m
    
    
    
    
    [00:00:04.208,312] 
    
    
    [0m<dbg> aws_fota_wrapper.broker_init: IPv4 Address 35.153.241.198
    
    
    [0m
    
    
    
    
    [00:00:04.216,278] 
    
    
    [0m<dbg> aws_fota_wrapper.client_init: client_id: krmp352656100158107
    
    
    [0m
    
    
    
    
    [00:00:04.235,260] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_connect: (0x20025940): Created socket 2
    
    
    [0m
    
    
    
    
    %CESQ: 47,2,12,1
    
    
    
    
    rsrp value = 0
    
    
    
    
    %CESQ: 52,2,12,1
    
    
    
    
    rsrp value = 47
    
    
    
    
    [00:00:08.480,560] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_connect: (0x20025940): Connect completed
    
    

    ... lines elided - everything in the elided lines works as expected....

    [0m<dbg> aws_fota.on_publish_evt: Received topic: $aws/things/krmp352656100158107/jobs/$next/get/accepted
    
    
    [0m
    
    
    [00:00:10.070,922] 
    
    
    [0m<inf> aws_fota: Checking for an available job
    
    
    [0m
    
    
    
    
    [00:00:10.077,667] 
    
    
    [0m<dbg> aws_fota.get_job_execution: Got only one field: {"clientToken":"","timestamp":1582816345}
    
    
    [0m
    
    
    
    
    [00:00:10.088,470] 
    
    
    [0m<inf> aws_fota: No queued jobs for this device
    
    
    [0m
    
    
    
    
    [00:00:38.681,243] 
    
    
    [0m<dbg> aws_fota_wrapper.aws_fota_check: Disconnecting MQTT client...
    
    
    [0m
    
    
    
    
    [00:00:38.689,727] 
    
    
    [0m<dbg> net_mqtt.client_write: (0x20025940): [0x200240a0]: Transport writing 2 bytes.
    
    
    [0m
    
    
    
    
    [00:00:38.708,587] 
    
    
    [0m<dbg> net_mqtt.client_write: (0x20025940): [0x200240a0]: Transport write complete.
    
    
    [0m
    
    
    
    
    [00:00:38.718,383] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_disconnect: (0x20025940): Closing socket 2
    
    
    [0m
    
    
    
    
    [00:00:38.729,370] 
    
    
    [1;31m<err> aws_fota_wrapper: [mqtt_evt_handler:183] MQTT client disconnected 0


    ... fota check run for a second time. Same code path, not that it hangs after creating the socket.....

    [0m
    
    
    
    
    AWS fota check finished
    
    
    
    
    AWS fota check 2 starting
    
    
    
    
    [00:00:38.743,011] 
    
    
    [0m<dbg> aws_fota_wrapper.aws_fota_check: MQTT AWS FOTA check starting, version: v1.0.14
    
    
    [0m
    
    
    [00:00:38.753,326] 
    
    
    [0m<dbg> aws_fota_wrapper.broker_init: IPv4 Address 35.153.241.198
    
    
    [0m
    
    
    
    
    [00:00:38.761,474] 
    
    
    [0m<dbg> aws_fota_wrapper.client_init: client_id: krmp352656100158107
    
    
    [0m
    
    
    
    
    [00:00:38.778,503] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_connect: (0x20025940): Created socket 2
    
    
    [0m

    I've attached a modem trace of this scenario as well.

     trace-2020-02-27T15-06-54.982Z.bin

Children
Related