Websocket connection has timeout too frequently

I am evaluating the nRF7002's WiFi capabilities and starting off with the nRF52840DK + nRF7002EK and a project that connects to a socket and sends data. I've primarily used the Websocket Client example (NCS v2.9.1, zephyr/samples/net/sockets/websocket_client) though I mixed in some code from the WiFi station project to ensure I connect to my local WiFi first (NCS v2.9.1, nrf/samples/wifi/sta). The server socket is one I set up with a simple Python script that simply reports any data sent to it.

Functionally my code is working: able to connect to WiFi, connect to socket, and send some data. For sending data, I'm having it repeatedly call websocket_connect(); it will return with an error but my server still gets the data (which is acceptable at this time of evaluation).

However, once my code gets to the connect() call as part of connect_socket(), it will occasionally (roughly 50% of the time) timeout after the default 3sec. I modified it to repeatedly call connect() until it succeeds, and noticed it may take 2-3 tries to connect. But this means that it's taking almost 10sec at most just to simply connect to a socket.

Could you please help look into why this operation is taking so long? We've created custom boards with nRF for years, and recently also with nRF7002. While it was developed through Linux drivers and such rather than with NCS/VS-Code/etc., we determined it can connect to WiFi, connect to socket, and send a little data all comfortably in <10sec. So I'm expecting similar performance.

More info: I am using a Windows 11 machine, VS Code, & nRF Connect Extension. It was already mentioned earlier but we're also using NCS v2.9.1, nRF52840DK, and nRF7002EK.

--------------------------------------------------------

Note that I also tried one of Nordic's official networking samples - the HTTPS client one (NCS v2.9.1, nrf/samples/net/https_client). But I'm not able to proceed that far because it's unable to connect to my local WiFi.

  • Made a build config
    • Board target: nrf52840dk/nrf52840
    • Config files: prj.conf, boards/native_sim.conf
      • Had to add native_sim.conf or else I would've had build errors
    • CMake arguments: -DSHIELD="nrf7002ek"
  • Tried the build from above, but command line wouldn't respond to any user input, so I couldn't type in the "wifi_cred" command
  • Then tried modifying prj.conf to have the static WiFi credentials, but it still wouldn't proceed past "Connecting to the network"
  • I tested your code, along with the websocketd server, and was able to reproduce a similar log. However, the original problem is still there – the overall time to perform this operation is taking too long. It’s just that we have narrowed down the source of the delay to be when the nRF is waiting for DHCP (which your latest log also shows as well).

    Why is this happening? As mentioned before, and as your sales rep is aware, we had previous projects using the nRF7002. We tested these with the same router and same Linux host as this current project, yet they did not experience this issue. These previous projects had the nRF code developed through Linux drivers, not through NCS/Zephyr. So perhaps this is something to do with the Zephyr drivers?

  • Hi,

     

    You're looking at the overall timing process, from boot until socket connected, correct?

    While it was developed through Linux drivers and such rather than with NCS/VS-Code/etc., we determined it can connect to WiFi, connect to socket, and send a little data all comfortably in <10sec. So I'm expecting similar performance.

    A linux device has more processing power for the WPA supplicant part than a nRF5-device has, and there will be a difference between the linux net based application implementation, as compared to the zephyr net implementation.

     

    The crypto cpu processing part will take around 3-4 seconds for WPA2 as an example. You can see the difference by setting up test AP's with WPA2 / WPA3 and "open" security.

    The below tests are from a complete cold boot (ie. power off, wait, power on), which is also why the logs are missing the first boot sequence logs (ie. "booting nRF connect SDK" etc)

    WPA2 (nRF52840DK + nRF7002EK):

    [00:00:01.631,896] <inf> sta: Static IP address (overridable): 192.168.1.99/255.255.255.0 -> 192.168.1.1
    [00:00:03.296,081] <inf> wifi_mgmt_ext: Connection requested
    [00:00:03.296,142] <inf> sta: Connection requested
    [00:00:08.471,496] <inf> sta: Connected
    [00:00:08.697,570] <inf> sta: Waiting for DHCP to be bound
    [00:00:13.485,076] <inf> net_dhcpv4: Received: 192.168.32.163
    [00:00:13.485,290] <inf> net_config: IPv4 address: 192.168.32.163
    [00:00:13.485,321] <inf> net_config: Lease time: 36000 seconds
    [00:00:13.485,351] <inf> net_config: Subnet: 255.255.255.0
    [00:00:13.485,412] <inf> net_config: Router: 192.168.32.1
    [00:00:13.485,809] <inf> sta: Try to open as client
    [00:00:14.093,872] <inf> sta: Connected
    [00:00:14.108,062] <inf> wifi_supplicant: Network interface 1 (0x20001140) down
    [00:00:14.108,306] <inf> sta: Interface down

    Open security:

    [00:00:01.630,187] <inf> sta: Static IP address (overridable): 192.168.1.99/255.255.255.0 -> 192.168.1.1
    [00:00:01.642,333] <inf> wifi_mgmt_ext: Connection requested
    [00:00:01.642,364] <inf> sta: Connection requested
    [00:00:06.731,109] <inf> sta: Connected
    [00:00:06.743,713] <inf> sta: Waiting for DHCP to be bound
    [00:00:06.748,962] <inf> net_dhcpv4: Received: 192.168.32.163
    [00:00:06.749,145] <inf> net_config: IPv4 address: 192.168.32.163
    [00:00:06.749,176] <inf> net_config: Lease time: 36000 seconds
    [00:00:06.749,206] <inf> net_config: Subnet: 255.255.255.0
    [00:00:06.749,267] <inf> net_config: Router: 192.168.32.1
    [00:00:06.749,694] <inf> sta: Try to open as client
    [00:00:07.369,934] <inf> sta: Connected
    [00:00:07.383,911] <inf> wifi_supplicant: Network interface 1 (0x20001140) down
    [00:00:07.384,155] <inf> sta: Interface down
    

     

    Looking into the implementation, there will be a timing variation on both these testing vectors, as dhcp is udp based and the zephyr dhcp client has randomness on sending the request on the first boot. 

    There is a random delay between 1 to 10 seconds in the initial dhcp client sequence, required as per RFC2131 4.4.1, which you can set with CONFIG_NET_DHCPV4_INITIAL_DELAY_MAX=2 (value of '2' is min at this time).

     

    Here are the configurations that I added:

    CONFIG_NRF_WIFI_PS_EXIT_EVERY_TIM=y
    CONFIG_NET_SHELL=n
    CONFIG_SHELL=n
    CONFIG_NET_DHCPV4_INITIAL_DELAY_MAX=2

     

    PS: If you want to test with no delay on initial dhcp request, you can set the "true -> false" in this function call:

    https://github.com/nrfconnect/sdk-zephyr/blob/v3.7.99-ncs3/subsys/net/lib/dhcpv4/dhcpv4.c#L1816

    RFC2131 ch 4.4.1 (https://www.rfc-editor.org/rfc/rfc2131.html#section-4.4) uses the wording "should" indicating that this initial delay is optional, which I will take up internally with our networking team.

      

    Kind regards,

    Håkon

Related