NRF7002 using one virtual interface breaks a different one

I am trying to use second virtual interface of NRF7002 to use with custom linux driver. I intended to use it to add support of STA+AP or STA+STA mode. However, I observing strange behaviour even if use only one virtual interface at time.

For example, I start STA mode on default interface and then try to connect to some WPA2 protected AP. It works fine. Capturing using tshark shows that packets are sent correctly:

    1 0.000000000 <AP MAC ADDR> → NordicSemico_00:48:25 EAPOL 113 Key (Message 1 of 4)
    2 0.208624917 NordicSemico_00:48:25 → <AP MAC ADDR> EAPOL 135 Key (Message 2 of 4)
    3 0.223133843 <AP MAC ADDR> → NordicSemico_00:48:25 EAPOL 169 Key (Message 3 of 4)
    4 0.421459593 NordicSemico_00:48:25 → <AP MAC ADDR> EAPOL 113 Key (Message 4 of 4)
    5 0.473706855 LannerElectr_c2:9f:58 → Broadcast    ARP 60 Who has 10.10.17.19? Tell 10.10.17.254
    6 0.474066484 LannerElectr_c2:9f:58 → Broadcast    ARP 60 Who has 10.10.16.113? Tell 10.10.17.254

Then I disconnects from AP and try to do the same thing on the second interface. I will not able to connect to this AP.
It will pass 3/4 packets from 4-way handshake but on subsequent tries it will fail on first packet of 4-way handshake.

First try capture:

1 0.000000000 <AP MAC ADDR> → NordicSemico_00:48:25 EAPOL 113 Key (Message 1 of 4)
2 0.205216091 NordicSemico_00:48:25 → <AP MAC ADDR> EAPOL 135 Key (Message 2 of 4)
3 0.266586286 <AP MAC ADDR> → NordicSemico_00:48:25 EAPOL 169 Key (Message 3 of 4)
4 0.493344800 HuaweiDevice_89:c7:64 → Broadcast ARP 56 Who has 10.10.16.236? Tell 10.10.16.17
5 0.493580469 HuaweiDevice_89:c7:64 → Broadcast ARP 56 Who has 10.10.17.163? Tell 10.10.16.17

Second try capture consist just retries of #1 message of 4-way handshake:

1 0.000000000 <AP MAC ADDR> → NordicSemico_00:48:25 EAPOL 113 Key (Message 1 of 4)
2 1.022842327 <AP MAC ADDR> → NordicSemico_00:48:25 EAPOL 113 Key (Message 1 of 4)
3 2.032899320 <AP MAC ADDR> → NordicSemico_00:48:25 EAPOL 113 Key (Message 1 of 4)
4 3.046177594 <AP MAC ADDR> → NordicSemico_00:48:25 EAPOL 113 Key (Message 1 of 4)

If I do the same but start with second interface instead default(#0) it will work: The second interface will be able to connect, but first one won't. So I assume that I have created second interface correctly.

Looking for methods to fix it or debug it.

Additional info

My custom driver have upgraded nrfxlib to commit. However, the issue was observed on default for example linux driver.

  • Apparently, the original example driver has a flaw:

    netdev_tx_t nrf_wifi_netdev_start_xmit(struct sk_buff *skb,
                           struct net_device *netdev)
    {
        // Removed for brevity...
        fmac_dev_ctx = rpu_ctx_lnx->rpu_ctx;
        def_dev_ctx = wifi_dev_priv(fmac_dev_ctx);
        host_stats = &def_dev_ctx->host_stats;
    
        // Removed for brevity...
    
        if ((vif_ctx_lnx->num_tx_pkt - host_stats->total_tx_pkts) >=
            CONFIG_NRF700X_MAX_TX_PENDING_QLEN) {
            if (!netif_queue_stopped(netdev)) {
                netif_stop_queue(netdev);
            }
            schedule_work(&vif_ctx_lnx->ws_queue_monitor);
        }
    
        // Removed for brevity...
    
        vif_ctx_lnx->num_tx_pkt++;
        schedule_work(&vif_ctx_lnx->ws_data_tx);
    
    out:
        return ret;
    }


    With second interface it seems that packet divided into parts and total_tx_pkts becomes bigger than num_tx_pkt the linux driver count. It creates integer underflow because both of them are unsigned and stops the queue forever.

    As a workaround I will use the difference: `host_stats->total_tx_pkts - host_stats->total_tx_done_pkts`

Related