nRF7002dk nrf5340 bus fault after shutting down interface

Hello,

I am working on a low power sensor logging project using the nRF7002dk. So far, I have updated the main loop in the wifi/sta sample and in place of the:

k_sleep(K_FOREVER);

I have added:

status = net_mgmt(NET_REQUEST_WIFI_DISCONNECT, iface, NULL, 0);
status = net_if_down(iface); 
k_sleep(K_SECONDS(2)); // will be longer in production, shortened for testing
status = net_if_up(iface);
k_sleep(K_SECONDS(2)); // allow the interface time to come up

I also have code verifying the statuses are returned as 0, but haven't posted that here to simplify my post.

I find that after a few connections, there is a kernel panic when trying to bring the interface back up.

[00:04:56.560,913] <inf> sta: State: SCANNING
[00:04:56.861,053] <inf> sta: ==================
[00:04:56.861,083] <inf> sta: State: SCANNING
[00:04:57.150,177] <err> os: ***** BUS FAULT *****
[00:04:57.150,177] <err> os: Precise data bus error
[00:04:57.150,207] <err> os: BFAR Address: 0x40000b08
[00:04:57.150,207] <err> os: r0/a1: 0x20000200 r1/a2: 0x20000580 r2/a3: 0x20000289
[00:04:57.150,207] <err> os: r3/a4: 0x20000588 r12/ip: 0x2003cca0 r14/lr: 0x200006a8
[00:04:57.150,238] <err> os: xpsr: 0x01000000
[00:04:57.150,238] <err> os: Faulting instruction address (r15/pc): 0x0003e97a
[00:04:57.150,268] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
[00:04:57.150,299] <err> os: Current thread: 0x20003470 (unknown)

The kernel panics stop if I add a 1 second delay between the net_mgmt and net_if_up calls. 

It appears that I can not shut down the interface immediately after disconnecting or something is not cleaned up that causes issues when bringing the interface back up. Obviously in production code, I would like to replace the delay with a check to ensure the disconnect has completed in case it takes longer than 1 second. How can I determine when it is safe to shut down the interface?

Parents Reply
  • Hi,

    bcornell said:
    I have already provided my code.

    Yes, you are right. We will look into it.

    bcornell said:
    The only differences between that and what I am running are that my code obviously has an SSID and password, and in http_get.c I've specified the address of an internal test server which posts a PHP endpoint which just returns the posted data.

    Thank you for this information.

    Best regards,
    Dejan

Children
  • Hi,

    Could you try to use possible fix given below?

    ---
     overlay-debug.conf | 14 ++++++++++++++
     prj.conf           |  4 ++++
     src/main.c         |  7 +++++++
     3 files changed, 25 insertions(+)
     create mode 100644 overlay-debug.conf
    
    diff --git a/overlay-debug.conf b/overlay-debug.conf
    new file mode 100644
    index 0000000..dd83de0
    --- /dev/null
    +++ b/overlay-debug.conf
    @@ -0,0 +1,14 @@
    +CONFIG_SHELL=y
    +CONFIG_SHELL_BACKEND_SERIAL=y
    +CONFIG_SHELL_STACK_SIZE=4096
    +CONFIG_NET_SHELL=y
    +CONFIG_SHELL_GETOPT=y
    +CONFIG_SHELL_CMDS_RESIZE=n
    +CONFIG_NRF700X_UTIL=y
    +CONFIG_NET_L2_WIFI_SHELL=y
    +CONFIG_NET_STATISTICS=y
    +CONFIG_NET_STATISTICS_WIFI=y
    +CONFIG_NET_STATISTICS_USER_API=y
    +CONFIG_SYS_HEAP_RUNTIME_STATS=y
    +# Enable for debugging connection issues
    +# CONFIG_WPA_SUPP_LOG_LEVEL_DBG=y
    diff --git a/prj.conf b/prj.conf
    index 9b91578..32695f2 100644
    --- a/prj.conf
    +++ b/prj.conf
    @@ -107,3 +107,7 @@ CONFIG_NET_HTTP_LOG_LEVEL_DBG=y
     CONFIG_THREAD_NAME=y
     
     CONFIG_RESET_ON_FATAL_ERROR=n
    +
    +
    +# debugging
    +CONFIG_SHELL_STACK_SIZE=4096
    diff --git a/src/main.c b/src/main.c
    index d9bcfe7..a3a76aa 100644
    --- a/src/main.c
    +++ b/src/main.c
    @@ -416,6 +416,12 @@ int wifi_poweron(struct net_if *iface)
     
     //~ #define BT_LE_ADV_CONN_DEF BT_LE_ADV_PARAM(BT_LE_ADV_OPT_CONNECTABLE, 0x0640, 0x0680, NULL)
     
    +void dump_rpu_stats(void)
    +{
    +	shell_execute_cmd(shell_backend_uart_get_ptr(), "wifi_util tx_stats 0");
    +	shell_execute_cmd(shell_backend_uart_get_ptr(), "wifi_util rpu_stats all");
    +}
    +
     int main(void)
     {
     	//volatile unsigned int *myPointer = (volatile unsigned int *) 0x5002B500;
    @@ -505,6 +511,7 @@ int main(void)
     			while (!have_ip) {
     				attempt++;		
     				printf("%d\n",attempt);
    +				dump_rpu_stats();
     				if (attempt==200) {
     					printf("No ip.\n");
     					err=-99;
    -- 
    

    Please make sure that you build with overlay-debug.conf by providing extra argument    -DOVERLAY_CONFIG=overlay-debug.conf. If the issue is still present, please provide full logs, and elf files.

    Best regards,
    Dejan

  • Hello,

    I have tested the changes you provided, they made no difference to the behavior. They all appear to be related to adding more debug output.

    I tested with both access points and have uploaded my elf files and logs here: dl.defelsko.com/.../nordic_logs_nov1.zip

  • Hi,

    Thank you for testing and for providing required files.
    We will look into it. I will get back to you with new information as soon as possible.

    Best regards,
    Dejan

  • Hi,

    Could you provide sniffer trace for Wi-Fi 4 communication?

    In wi-fi 4 case, what is the criterion for OK/FAIL? Are OK and FAIL status codes from httpbin.org?

    Best regards,
    Dejan

  • In wi-fi 4 case, what is the criterion for OK/FAIL? Are OK and FAIL status codes from httpbin.org?

    We are not using httpbin.org as I mentioned previously:

    I've specified the address of an internal test server which posts a PHP endpoint which just returns the posted data.

    I added httpbin.org to the code provided so that you could see how the http post code functions. You will likely need a PHP endpoint on your own server to test this, I found that httpbin.org appears to have DoS protections that prevent this test from working continuously.

    I do not know what the criterion for OK/FAIL are. That is why I posted in the DevZone asking for help. I assume they come from somewhere in your driver code or is a message being passed through from the modem?

    Has any progress been made on the initial WiFi 6 issue where an MPU fault occurs when you disconnect & bring the interface down while to obtain an IP address? Based on the logs, it appears that the problem there is that once I give up waiting for an IP and try to disconnect and bring the interface down, the MPU fault occurs. I assume that is related to background processing not being cleaned up? 

Related