This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF modem 1.1.1 firmware, 1.2 SDK. Re-using sockets causes hang in nrf_connect.

I'm porting a working app from SDK 1.1 to 1.2, and have noted that when I make an mqtt connection close it, and re-open (to a different mqtt server) that the call to nrf_connect hangs indefinitely.

The socket number is the same, which makes me think that when the first connection closes, it leaves something in a bad state.

The two separate mqtt connections work fine in isolation from each other, just not when the connections are made sequentially. (open, close, open)

Also, I've noted that when I modify the code to make both connections one after the other, forcing the use of another "new" socket for the second connection, things work fine.

This worked fine with modem firmware 1.1.1 and SDK 1.1, so it seems like it might be a new SDK 1.2 bug.

Parents Reply Children
  • Didrik,

    I've attached a trace file. I had some trouble getting a trace in our target hardware and wound up using the DK to do it.

    What I mean by "re-using" a socket is that the same socket number would get used for both connections. You can note this by watching the net_mqtt log output.

    I've adapted the aws_fota stuff from the aws_fota sample. To demo this problem, I simply ran aws_fota_check twice. The first one works, the second hangs early on, when it trys to call nrf_connect. Our normal application has us connecting to AWS for a fota check at startup, then disconnecting and connecting to Azure via mqtt.

    To demo this I'm simply doing what the aws_fota example does, except that after a timeout, I'm closing the mqtt connection.

    You can see a trimmed log below, showing the issue.



    AWS fota check starting
    
    
    
    
    [00:00:04.098,114] 
    
    
    [0m<dbg> aws_fota_wrapper.aws_fota_check: MQTT AWS FOTA check starting, version: v1.0.14
    
    
    [0m
    
    
    
    
    [00:00:04.208,312] 
    
    
    [0m<dbg> aws_fota_wrapper.broker_init: IPv4 Address 35.153.241.198
    
    
    [0m
    
    
    
    
    [00:00:04.216,278] 
    
    
    [0m<dbg> aws_fota_wrapper.client_init: client_id: krmp352656100158107
    
    
    [0m
    
    
    
    
    [00:00:04.235,260] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_connect: (0x20025940): Created socket 2
    
    
    [0m
    
    
    
    
    %CESQ: 47,2,12,1
    
    
    
    
    rsrp value = 0
    
    
    
    
    %CESQ: 52,2,12,1
    
    
    
    
    rsrp value = 47
    
    
    
    
    [00:00:08.480,560] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_connect: (0x20025940): Connect completed
    
    

    ... lines elided - everything in the elided lines works as expected....

    [0m<dbg> aws_fota.on_publish_evt: Received topic: $aws/things/krmp352656100158107/jobs/$next/get/accepted
    
    
    [0m
    
    
    [00:00:10.070,922] 
    
    
    [0m<inf> aws_fota: Checking for an available job
    
    
    [0m
    
    
    
    
    [00:00:10.077,667] 
    
    
    [0m<dbg> aws_fota.get_job_execution: Got only one field: {"clientToken":"","timestamp":1582816345}
    
    
    [0m
    
    
    
    
    [00:00:10.088,470] 
    
    
    [0m<inf> aws_fota: No queued jobs for this device
    
    
    [0m
    
    
    
    
    [00:00:38.681,243] 
    
    
    [0m<dbg> aws_fota_wrapper.aws_fota_check: Disconnecting MQTT client...
    
    
    [0m
    
    
    
    
    [00:00:38.689,727] 
    
    
    [0m<dbg> net_mqtt.client_write: (0x20025940): [0x200240a0]: Transport writing 2 bytes.
    
    
    [0m
    
    
    
    
    [00:00:38.708,587] 
    
    
    [0m<dbg> net_mqtt.client_write: (0x20025940): [0x200240a0]: Transport write complete.
    
    
    [0m
    
    
    
    
    [00:00:38.718,383] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_disconnect: (0x20025940): Closing socket 2
    
    
    [0m
    
    
    
    
    [00:00:38.729,370] 
    
    
    [1;31m<err> aws_fota_wrapper: [mqtt_evt_handler:183] MQTT client disconnected 0


    ... fota check run for a second time. Same code path, not that it hangs after creating the socket.....

    [0m
    
    
    
    
    AWS fota check finished
    
    
    
    
    AWS fota check 2 starting
    
    
    
    
    [00:00:38.743,011] 
    
    
    [0m<dbg> aws_fota_wrapper.aws_fota_check: MQTT AWS FOTA check starting, version: v1.0.14
    
    
    [0m
    
    
    [00:00:38.753,326] 
    
    
    [0m<dbg> aws_fota_wrapper.broker_init: IPv4 Address 35.153.241.198
    
    
    [0m
    
    
    
    
    [00:00:38.761,474] 
    
    
    [0m<dbg> aws_fota_wrapper.client_init: client_id: krmp352656100158107
    
    
    [0m
    
    
    
    
    [00:00:38.778,503] 
    
    
    [0m<dbg> net_mqtt_sock_tls.mqtt_client_tls_connect: (0x20025940): Created socket 2
    
    
    [0m

    I've attached a modem trace of this scenario as well.

     trace-2020-02-27T15-06-54.982Z.bin

  • Your modem trace was empty (or rather, full of zeroes).

    Is your application using SPI1, TWI1 or UART1 to something? If so, that could interfere with the modem trace.

    Did you add CONFIG_BSD_LIBRARY_TRACE_ENABLED=y to your prj.conf (and re-open the project if you are using SES)?

    Did you send AT%XSYSTEMMODE=1,2, then saved it to flash by sending AT+CFUN=0?

    Could you try to capture a new trace?

    I'll also try to re-create the issue on my end.

  • Didrik,

    I followed the guide carefully, including all the steps you just mentioned. I'll re-check and see if I can figure out what might have gone wrong.

    If you try to reproduce, simply make an mqtt connection, close it, and then make it again, that should demonstrate the issue.

  • Didrik and Johan,

    I think both of you were wondering how I worked around this problem.

    Here is the procedure.

    Open MQTT Connection 1 (with TLS)

    Open MQTT Connection 2 (with TLS)

    Service connection 1 (in this case AWS FOTA)

    Close Connection 1.

    Service connection 2.

    The time require to service connection 1 is short enough that connection 2 does not time out, it's well within the mqtt poll interval.

    Didrik,

    I'll try the modem log again.

    Josh

Related