nrf5340/nrf7002 Connecting to wifi : ends up in blocked state consuming 60mA current?

nrf5340+nrf7002 running Zephyr with NCS v2.6.x. 

When trying to connect to a wifi AP with incorrect credentials, I have several issues:

1/ the result of the (failed) connect request always returns 10s later, no matter the value of 'timeout' in the wifi_connect_req_params structure passed to the net_mgmt request.

- shouldn't it use the timeout value?

2/ the result, returned as event NET_EVENT_WIFI_CONNECT_RESULT in a net_mgmt callback handler, doesn't give me a 'result' I can use?

How do I find out that the connection  attempt failed? I tried checking the 'state' of the connection by doing a NET_REQUEST_WIFI_IFACE_STATUS, but it shows the state as being 'WIFI_STATE_ASSOCIATED', which the wifi demo code uses to indicate the wifi is connected!

(I can clearly see that the attempt has stopped after 10s because the power consumpion drops back to 5-6mA, instead of around 60mA)

3/ if I do not explicltly disconnect after the failed attempt, the stack continually retries the connection every 30s, and appears to have a memory leak of about 120 bytes at eash attempt... leading eventually to a hang because it exhausts the heap...

To avoid this, I added a 'connect timer' in my code, which does an explicit NET_REQUEST_WIFI_DISCONNECT after X seconds. This stops the automatic retry, and lets my code retry the connect at its own pace....

However:

4/ After a few attempts (<10, one every 3 minutes), the wifi stack seems to get in a bad state, and the power consumption sticks at 60mA average! (as though it was still trying to connect to the AP)

I get this log:

[00:57:24.123,870] <err> wifi_nrf: nrf_wifi_wpa_supp_scan_abort: Timedout waiting for scan abort response, ret = -11

Requesting a disconnect operation is accepted by the wifi stack but has no apparent affect on the power consumption. And it doesn't then accept a connect attempt with valid credentials after getting in this state. Could this be because I force disconnect when its still trying to connect?

Any pointers on how to get this to be more robust?

thanks

Parents
  • Hi,

     

    BrianW said:
    The heap size is fine, the problem is that each re-try that fails loses about 120-140 bytes! So it doesn't matter how big the heap is to start with, eventually it ends up in the bad state. What I need is a fix for the memory leak! (presumably in the WPA supplient code...)

    My deepest apologies for this issue. We have a fix in-place for v2.7-branch and v2.8, but the backport of it to v2.6-branch is yet to be merged.

    BrianW said:
    I'm not sure I signed up as a beta tester for your wifi products.... I'll take a look...

    This was not our intention, and I am sorry for the inconvenience this has caused.

    BrianW said:

    This is not a sample, this is my real application :-) Looking at the sample wifi_sta, the struct wifi_connect_req_params timeout field is set to 

    CONFIG_STA_CONN_TIMEOUT_SEC * MSEC_PER_SEC;
    which implies it should be the timeout for the connection attempt? But the stack always seems to take 10s...
    However, I see in this sample that the struct net_mgmt_event_callback *cb 'info' field should contan the connection attempt status (although there is both "status' and 'conn_status' which seems redundant? The sample just uses 'status!=0' as being 'failure' and doesnt look at conn_status.)
    The possible values for 'conn_status' seem more useful: I will try using these to detect the failed connection attempt!

    Sorry, should have been a bit clearer in my question, and given a bit more background to it. The reason why I am asking is that there are subsystems that override the values, such as setting CONFIG_WIFI_CREDENTIALS_STATIC_SSID="MySSID" and CONFIG_WIFI_CREDENTIALS_STATIC_PASSWORD="MyPassword".

    My question is how do you setup your credentials? Is it using a secure storage, directly via the net_mgmt APIs or static in Kconfig?

     

    As an example, if you use the NET_REQUEST_WIFI_CONNECT_STORED type, as wifi/sta does:

    https://github.com/nrfconnect/sdk-nrf/blob/v2.6.2/samples/wifi/sta/src/main.c#L225

    You will load the default values given here:

    https://github.com/nrfconnect/sdk-nrf/blob/v2.6.2/subsys/net/lib/wifi_mgmt_ext/wifi_mgmt_ext.c#L74

     

    This .timeout value is then given to the WPA supplication (hostap):

    https://github.com/nrfconnect/sdk-nrf/blob/v2.6.2/modules/hostap/src/supp_api.c#L542

    and checked against a timer here:

    https://github.com/nrfconnect/sdk-nrf/blob/v2.6.2/modules/hostap/src/supp_api.c#L125-L135

    BrianW said:

    I'll take a look at the info you reference; although this seems very complex to just get robust long-term wifi operation on the device... Why doesn't the stack deal with this internally or when I disconnect it?!

    This is in general the problem with the samples, they don't help with creating a 'real' application which has to deal with long running operation with connect/disconnect/errors that occur in real-life...

    If the scenario occurs, where there for some reason gives a locked up state, the process of taking down the interface and booting it up again is handled in the background.

    However, it will require the application to manually re-connect, which is the reason why this is given back as a "wi-fi ready" event. This is then a signal to the application that the wi-fi IF has been taken down and re-initialized, thus the application will need to re-connect over wi-fi and setup any socket based communication as required.

     

    Kind regards,

    Håkon

  • Question : would migrating my project from 2.6.x to 2.8 latest help the wifi stability?

    Any advice on such a migration?

  • Hi,

     

    v2.8.0 has the fix that was mentioned here:

    Håkon Alseth said:

    My apologies for this inconvenience. We have an open PR to fix this in the v2.6.x-branch:

    https://github.com/nrfconnect/sdk-nrf/pull/18790

    So in that aspect, yes; you should see stability improvement when running v2.8.0 as compared to v2.6.2.

    BrianW said:
    Any advice on such a migration?

    We have a migration guide in our documentation on how to go from multi-build to sysbuild:

    https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/releases_and_maturity/migration/migration_sysbuild.html

     

    Let me know if you run into any issues, and I will try to help out.

     

    Kind regards,

    Håkon

  • Let me know if you run into any issues, and I will try to help out.

     

    So many issues.... I will open a new ticket about the migration as its completely broken my project...

Reply Children
Related