nrf5340/nrf7002 wifi stack in case of problems connecting to AP?

Using nrf5340 with nrf7002 on a custom board, and trying to get a stable wifi operation (including in situations where the wifi AP availability will vary, as this is a mobile device...)

Having experienced problems with the wifi driver ending up 'stuck', I updated from NS 2.6 to 2.8 (what a pain that was). It is more stable now, but still sometimes fails to find my local AP (even without movement!). I get logs like this (even though I do not use the wifi credentials system)
[00:03:23.501,251] <err> wpa_supp: Line 0: invalid key_mgmt 'SAE'
and more relevantly this:
[00:00:29.706,390] <err> wifi_nrf: nrf_wifi_wpa_supp_scan_abort: Timedout waiting for scan abort response, ret = -11
and
[00:01:17.710,052] <err> wpa_supp: wpa_drv_zep_get_scan_results2: Timed out waiting for scan results

Sometimes it ends up connecting even with the wpa_supp logs, but the wifi_nrf one seems to be bad....

After requesting connect I set a timeout (12s) - when this pops, I request a disconnect.
int status = net_mgmt(NET_REQUEST_WIFI_DISCONNECT, ctx->iface, NULL, 0);

After 4 attempts that end like this, I attempt to recover by setting the interface down then up again.

// Reset wifi by putting interface down then up
static bool _wifi_reset(struct _netwifi_ctx* ctx) {
  // make interface active
  int ret = 0;
  ret = net_if_down(ctx->iface);
  if (ret==0 || ret==-EALREADY) {
    log_info("netwifi:iface is down!");
    // Wait a little bit
    k_msleep(100);
    ret = net_if_up(ctx->iface);
    if (ret==0 || ret==-EALREADY) {
      log_info("netwifi: iface is up!");
      return true;
    }
    log_warn("netwifi: iface failed to become up (%d)",ret);
  } else {
    log_warn("netwifi: iface failed to become down (%d)",ret);
  }
  return false;
}

This systematically results in a bus fault:

[00:09:23.517,608] <wrn> app: netwifi: connect check timer pops, connect() retry ongoing...
[00:09:35.517,639] <wrn> app: netwifi: connect timeout, too many (4), trying wifi reset
[00:09:46.209,045] <err> wifi_nrf: nrf_wifi_fmac_chg_vif_state: RPU is unresponsive for 10 sec
[00:09:46.218,444] <err> wifi_nrf: nrf_wifi_if_stop_zep: nrf_wifi_fmac_chg_vif_state failed
[00:09:46.229,095] <inf> app: netwifi:iface is down!
[00:09:46.337,493] <inf> wifi_nrf_bus: SPIM spi@a000: freq = 24 MHz
[00:09:46.344,268] <inf> wifi_nrf_bus: SPIM spi@a000: latency = 1
[00:09:46.529,388] <err> wpa_supp: zephyr_get_handle_by_ifname: Unable to get wpa_s handle for wlan0
[00:09:46.539,520] <err> wpa_supp: Interface wlan0 not found
[00:09:46.544,555] <inf> app: netwifi: iface is up!
[00:09:58.530,639] <err> wpa_supp: wpa_drv_zep_scan_timeout: Scan timeout - try to abort it
[00:09:58.539,733] <err> os: ***** BUS FAULT *****
[00:09:58.545,257] <err> os: Precise data bus error
[00:09:58.551,055] <err> os: BFAR Address: 0x11f3ef53
[00:09:58.557,037] <err> os: r0/a1: 0x20059a80 r1/a2: 0x00000000 r2/a3: 0x00000000
[00:09:58.565,826] <err> os: r3/a4: 0x11f3ef47 r12/ip: 0x00000000 r14/lr: 0x00037af9
[00:09:58.574,615] <err> os: xpsr: 0x61000000
[00:09:58.579,895] <err> os: Faulting instruction address (r15/pc): 0x00038588
[00:09:58.587,890] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
[00:09:58.595,886] <err> os: Current thread: 0x200061a8 (unknown)
[00:09:58.602,722] <err> os: Halting system

The debugger says this code in NCS modules/lib/hostap/src/drivers/driver_zephyr.c  is at fault, when it tries to call dev_ops->scan_abort.

static int wpa_drv_zep_abort_scan(void *priv,
   u64 scan_cookie)
{
  struct zep_drv_if_ctx *if_ctx = NULL;
  const struct zep_wpa_supp_dev_ops *dev_ops;
  int ret = -1;

  if_ctx = priv;

  dev_ops = get_dev_ops(if_ctx->dev_ctx);
  if (!dev_ops->scan_abort) {
    wpa_printf(MSG_ERROR,
      "%s: No op registered for scan_abort",
      __func__);
    goto out;
  }

  ret = dev_ops->scan_abort(if_ctx->dev_priv);
out:
  return ret;
}

dev_ops points to a structure where all the pointers are NULL, but I think even that pointer is bad (0x11f3ef47 is neither flash nor RAM?)....

Presumably wpa_supp is trying to abort the scan from the previous connection attempt, but hasn't dealt with the if-down/if-up restart correctly, so is holding on to a device context that is no longer valid...

I note the 'RPU is unresponsive' log... I have CONFIG_NRF_WIFI_RPU_RECOVERY=y as per a prior ticket about the wifi instability on NCS 2.6.x, which is why I updated to NCS 2.8.0....

Q: what could be causing the scan results errors that are causing it to be stuck?

Q: How to correctly stop/restart the wifi interface to recover from it being 'stuck'?

Parents
  • Hi Brian,

    BrianW said:
    What is the correct way in 2.9 to set the log level for these modules? (neccessary to get the image to fit in the flash)

    You can check the configuration dependency and conflict from page Kconfig search.

    I think the better approach is to use the nRF Kconfig GUI for configuration, as it helps avoid conflicts and naturally follows dependencies.

    After finishing the setup, you can save the modified debug configuration as an overlay file and reuse it for future builds.

    You can check the final configuration used during the build in the build folder .config file.

    BrianW said:
    [By the way, I built wifi_sta for ncs 2.8 (for another ticket) : this most basic wifi sample that literally just connects to a wifi access point and nothing else, uses 550kB out of the 1Mb available. This is not great...]

    Assuming you expect a small memory footprint with such a "simple" sample, but that's not the case. STA mode is not that simple—running the full Wi-Fi stack and driver to support it requires around 500KB. You can find more details on the following page.

    Considering the application protocol running on top of the IP protocol, you may need an additional 100-200KB for "simple" samples like UDP, TCP, HTTPS, MQTT, CoAP simple transmission. Scan mode consumes much less memory, around 150KB.

    Unfortunately, this is just how the Wi-Fi stack works. Keep in mind that Wi-Fi is typically running on PCs or mobile devices with significantly more memory resources.

    Memory requirements for Wi-Fi applications in Station mode

    Memory requirements for Wi-Fi applications in Scan mode

    Best regards,

    Charlie

  • For the KCONFIG errors, I have managed to fix those (it doesn't like it in 2.9 if you disable all net logs (CONFIG_NET_LOG=n) as well as have log level settings. I have to disable all the logs anyway as otherwise the image is too big with 2.9 (increase of around 5kB just due to this)

    The fatal build error was due to downloading NCS 2.9 from the giuthub directly as a zip, as the build now depends on finding the .git index file.... fixed by reinstalling 2.9 using west....

    I will have to get the wifi firmware to be in a seperate XIP flash partition, as otherwise the 2.9 build is too large for mcuboot to accept for DFU... I will open a seperate ticket for this.

  • Hi Brian,

    Good to hear that you’ve identified the cause of the issue and made progress.

    I also noticed that you generated and answered the Wi-Fi firmware in external flash question yourself. This feedback will be shared with our documentation team for further improvement.

    Best regards,
    Charlie

Reply Children
No Data
Related