nrf5340/nrf7002 Connecting to wifi : ends up in blocked state consuming 60mA current?

nrf5340+nrf7002 running Zephyr with NCS v2.6.x. 

When trying to connect to a wifi AP with incorrect credentials, I have several issues:

1/ the result of the (failed) connect request always returns 10s later, no matter the value of 'timeout' in the wifi_connect_req_params structure passed to the net_mgmt request.

- shouldn't it use the timeout value?

2/ the result, returned as event NET_EVENT_WIFI_CONNECT_RESULT in a net_mgmt callback handler, doesn't give me a 'result' I can use?

How do I find out that the connection  attempt failed? I tried checking the 'state' of the connection by doing a NET_REQUEST_WIFI_IFACE_STATUS, but it shows the state as being 'WIFI_STATE_ASSOCIATED', which the wifi demo code uses to indicate the wifi is connected!

(I can clearly see that the attempt has stopped after 10s because the power consumpion drops back to 5-6mA, instead of around 60mA)

3/ if I do not explicltly disconnect after the failed attempt, the stack continually retries the connection every 30s, and appears to have a memory leak of about 120 bytes at eash attempt... leading eventually to a hang because it exhausts the heap...

To avoid this, I added a 'connect timer' in my code, which does an explicit NET_REQUEST_WIFI_DISCONNECT after X seconds. This stops the automatic retry, and lets my code retry the connect at its own pace....

However:

4/ After a few attempts (<10, one every 3 minutes), the wifi stack seems to get in a bad state, and the power consumption sticks at 60mA average! (as though it was still trying to connect to the AP)

I get this log:

[00:57:24.123,870] <err> wifi_nrf: nrf_wifi_wpa_supp_scan_abort: Timedout waiting for scan abort response, ret = -11

Requesting a disconnect operation is accepted by the wifi stack but has no apparent affect on the power consumption. And it doesn't then accept a connect attempt with valid credentials after getting in this state. Could this be because I force disconnect when its still trying to connect?

Any pointers on how to get this to be more robust?

thanks

Parents
  • When it runs out of memory, the logs:

    [14:23:55.279,052] <err> wifi_nrf: umac_cmd_alloc: Failed to allocate UMAC cmd
    [14:23:55.286,895] <err> wifi_nrf: umac_cmd_cfg: umac_cmd_alloc failed
    [14:23:55.294,067] <err> wifi_nrf: nrf_wifi_wpa_set_supp_port: nrf_wifi_fmac_chg_sta failed
    [14:23:56.303,741] <err> wifi_nrf: nrf_wifi_fmac_scan: Unable to allocate memory
    [14:23:56.311,737] <err> wifi_nrf: nrf_wifi_wpa_supp_scan2: Scan trigger failed

    And the memory:

    [14:24:03.500,335] <inf> app: System Stats: Heap now : free 1580, used 67112, max used 68040.

    Just after boot, free is around 13000, and is stable when the wifi is connected.

  • Hi,

     

    Your log indicates that the device is going into a bad state, where it cannot send umac commands to the nRF7002.

    BrianW said:

    [14:23:55.279,052] <err> wifi_nrf: umac_cmd_alloc: Failed to allocate UMAC cmd
    [14:23:55.286,895] <err> wifi_nrf: umac_cmd_cfg: umac_cmd_alloc failed
    [14:23:55.294,067] <err> wifi_nrf: nrf_wifi_wpa_set_supp_port: nrf_wifi_fmac_chg_sta failed
    [14:23:56.303,741] <err> wifi_nrf: nrf_wifi_fmac_scan: Unable to allocate memory
    [14:23:56.311,737] <err> wifi_nrf: nrf_wifi_wpa_supp_scan2: Scan trigger failed

    And the memory:

    [14:24:03.500,335] <inf> app: System Stats: Heap now : free 1580, used 67112, max used 68040.

    Just after boot, free is around 13000, and is stable when the wifi is connected.

    This is a typical issue if the heap is not large enough. Can you try to increase the heap via CONFIG_HEAP_MEM_POOL_SIZE to see if this helps?

    When trying to connect to a wifi AP with incorrect credentials, I have several issues:

    1/ the result of the (failed) connect request always returns 10s later, no matter the value of 'timeout' in the wifi_connect_req_params structure passed to the net_mgmt request.

    - shouldn't it use the timeout value?

    Are you using the static configuration? If yes, then it is configured via CONFIG_WIFI_MGMT_EXT_CONNECTION_TIMEOUT.

    2/ the result, returned as event NET_EVENT_WIFI_CONNECT_RESULT in a net_mgmt callback handler, doesn't give me a 'result' I can use?

    How do I find out that the connection  attempt failed? I tried checking the 'state' of the connection by doing a NET_REQUEST_WIFI_IFACE_STATUS, but it shows the state as being 'WIFI_STATE_ASSOCIATED', which the wifi demo code uses to indicate the wifi is connected!

    (I can clearly see that the attempt has stopped after 10s because the power consumpion drops back to 5-6mA, instead of around 60mA)

    In this state, you will be in the 4-way handshake state until a timer has elapsed. Based on your description, it does sound like the sample in question does not handle this scenario gracefully. Which example are you using here?

    3/ if I do not explicltly disconnect after the failed attempt, the stack continually retries the connection every 30s, and appears to have a memory leak of about 120 bytes at eash attempt... leading eventually to a hang because it exhausts the heap...

    To avoid this, I added a 'connect timer' in my code, which does an explicit NET_REQUEST_WIFI_DISCONNECT after X seconds. This stops the automatic retry, and lets my code retry the connect at its own pace....

    My apologies for this inconvenience. We have an open PR to fix this in the v2.6.x-branch:

    https://github.com/nrfconnect/sdk-nrf/pull/18790

     

    Please note that this is still in review, and is not yet merged, but feel free to test it to see if this helps plug the memleak.

    4/ After a few attempts (<10, one every 3 minutes), the wifi stack seems to get in a bad state, and the power consumption sticks at 60mA average! (as though it was still trying to connect to the AP)

    I get this log:

    [00:57:24.123,870] <err> wifi_nrf: nrf_wifi_wpa_supp_scan_abort: Timedout waiting for scan abort response, ret = -11

    Requesting a disconnect operation is accepted by the wifi stack but has no apparent affect on the power consumption. And it doesn't then accept a connect attempt with valid credentials after getting in this state. Could this be because I force disconnect when its still trying to connect?

    Any pointers on how to get this to be more robust?

    You can use the Wifi ready library (which uses the underlying RPU recovery functionality) to detect if such a problem has occurred.

    Here's a guide on how to enable it:

    A reset of the nRF7002 will require that the application re-connects to the network and also handle any previous connections as such.

     

    Kind regards,

    Håkon

  • Thanks for the responses!

    This is a typical issue if the heap is not large enough. Can you try to increase the heap via CONFIG_HEAP_MEM_POOL_SIZE to see if this helps?

    The heap size is fine, the problem is that each re-try that fails loses about 120-140 bytes! So it doesn't matter how big the heap is to start with, eventually it ends up in the bad state. What I need is a fix for the memory leak! (presumably in the WPA supplient code...)

    shouldn't it use the timeout value?

    Are you using the static configuration? If yes, then it is configured via CONFIG_WIFI_MGMT_EXT_CONNECTION_TIMEOUT.

    This is not a sample, this is my real application :-) Looking at the sample wifi_sta, the struct wifi_connect_req_params timeout field is set to 

    CONFIG_STA_CONN_TIMEOUT_SEC * MSEC_PER_SEC;
    which implies it should be the timeout for the connection attempt? But the stack always seems to take 10s...
    However, I see in this sample that the struct net_mgmt_event_callback *cb 'info' field should contan the connection attempt status (although there is both "status' and 'conn_status' which seems redundant? The sample just uses 'status!=0' as being 'failure' and doesnt look at conn_status.)
    The possible values for 'conn_status' seem more useful: I will try using these to detect the failed connection attempt!
    For the continual retries:

    My apologies for this inconvenience. We have an open PR to fix this in the v2.6.x-branch:

    https://github.com/nrfconnect/sdk-nrf/pull/18790

     

    Please note that this is still in review, and is not yet merged, but feel free to test it to see if this helps plug the memleak.

    I'm not sure I signed up as a beta tester for your wifi products.... I'll take a look...
    As for the badness when I disconnect before its given up:
    You can use the Wifi ready library (which uses the underlying RPU recovery functionality) to detect if such a problem has occurred.

    I'll take a look at the info you reference; although this seems very complex to just get robust long-term wifi operation on the device... Why doesn't the stack deal with this internally or when I disconnect it?!

    This is in general the problem with the samples, they don't help with creating a 'real' application which has to deal with long running operation with connect/disconnect/errors that occur in real-life...

Reply
  • Thanks for the responses!

    This is a typical issue if the heap is not large enough. Can you try to increase the heap via CONFIG_HEAP_MEM_POOL_SIZE to see if this helps?

    The heap size is fine, the problem is that each re-try that fails loses about 120-140 bytes! So it doesn't matter how big the heap is to start with, eventually it ends up in the bad state. What I need is a fix for the memory leak! (presumably in the WPA supplient code...)

    shouldn't it use the timeout value?

    Are you using the static configuration? If yes, then it is configured via CONFIG_WIFI_MGMT_EXT_CONNECTION_TIMEOUT.

    This is not a sample, this is my real application :-) Looking at the sample wifi_sta, the struct wifi_connect_req_params timeout field is set to 

    CONFIG_STA_CONN_TIMEOUT_SEC * MSEC_PER_SEC;
    which implies it should be the timeout for the connection attempt? But the stack always seems to take 10s...
    However, I see in this sample that the struct net_mgmt_event_callback *cb 'info' field should contan the connection attempt status (although there is both "status' and 'conn_status' which seems redundant? The sample just uses 'status!=0' as being 'failure' and doesnt look at conn_status.)
    The possible values for 'conn_status' seem more useful: I will try using these to detect the failed connection attempt!
    For the continual retries:

    My apologies for this inconvenience. We have an open PR to fix this in the v2.6.x-branch:

    https://github.com/nrfconnect/sdk-nrf/pull/18790

     

    Please note that this is still in review, and is not yet merged, but feel free to test it to see if this helps plug the memleak.

    I'm not sure I signed up as a beta tester for your wifi products.... I'll take a look...
    As for the badness when I disconnect before its given up:
    You can use the Wifi ready library (which uses the underlying RPU recovery functionality) to detect if such a problem has occurred.

    I'll take a look at the info you reference; although this seems very complex to just get robust long-term wifi operation on the device... Why doesn't the stack deal with this internally or when I disconnect it?!

    This is in general the problem with the samples, they don't help with creating a 'real' application which has to deal with long running operation with connect/disconnect/errors that occur in real-life...

Children
No Data
Related