Error while blending two Nordic samples – the WiFi Provisioning sample and the Memfault sample

Hello Nordic,

I hope this message finds you well. I wanted to reach out to you all for some guidance regarding a challenge I'm facing with my nRF7002DK board. I'm relatively new to Zephyr, and I think that's where I'm hitting a snag.

I've been working on merging two Nordic samples - the WiFi Provisioning sample and the Memfault sample. Each works fine on its own, but I'm encountering issues when I try to combine them into a single project. Specifically, I'm attempting to replace the WiFi static credentials with Bluetooth for provisioning, but this leads to the provisioning service failing to connect to the WiFi network.

You can find my project here: Provisioning-memfault on GitHub.
I would appreciate it if you could try to run it and provide any insights, advice, or pointers you can share.


What I found was that I was able to retrieve the SSID of the Wi-Fi and the passkey.
I ended up in this file: \ncs\v2.5.99-dev1\nrf\subsys\bluetooth\services\wifi_prov\wifi_prov_handler.c.

The error happens inside this function: static void prov_set_config_handler(Request *req, Response *rsp).
This particular code blob is where the error occurs. In our case, we end up in error path number 4.
I don't understand why the network management layer suddenly fails to connect when we integrate memfault.
I can't pinpoint what has changed in the configuration.

	rc = net_mgmt(NET_REQUEST_WIFI_CONNECT, iface, &cnx_params,
		      sizeof(struct wifi_connect_req_params));
	/* Invalid argument error. */
	if (rc == -EINVAL) {
		rsp->has_status = true;
		rsp->status = Status_INVALID_ARGUMENT;
		LOG_ERR("		3                  ");
		return;
	}
	/* Other error. */
	if (rc != 0) {
		rsp->has_status = true;
		rsp->status = Status_INTERNAL_ERROR;
		LOG_ERR("		4                  ");
		return;
	}

  • Good morning Vincent,

     

    I hope you are doing good.

    There is no direct update yet, but I know the team has started to look into the issue.

     

    Kind regards,

    Håkon

  • Hi Vincent,

     

    Thank you for your patience in this matter.

    I have gotten help from R&D wrt. debugging your application, and it seems that this is related to heap management.

    There is two types of .heap in zephyr:

    * Zephyr system heap (CONFIG_HEAP_MEM_POOL_SIZE)

    * libc heap

    The latter is used by certain aspects of hostap (wpa supplicant) and libc malloc uses, by default, all freely available RAM. This means that if your application uses close to 100% of available RAM, then there's no space left for malloc() calls.

    If one sets:

    CONFIG_COMMON_LIBC_MALLOC=y

    CONFIG_COMMON_LIBC_MALLOC_ARENA_SIZE=32768

    Then the libc heap size is also set.

     

    You will highly likely have linking problems afterwards, so you can safely reduce the "CONFIG_HEAP_MEM_POOL_SIZE" to 150k.

     

    Could you try this on your end and see if this helps the scenario?

     

    Kind regards,

    Håkon

  • Thank you Håkon, for getting back to us with an update. 

    We will need some clarification on the steps that your team implemented to make everything work because, on our side, we are running into a fault. 

    1. Network Stack Control Methods Status:

      • NET_SHELL: Previously encountered difficulties in enabling.
      • NET_MGMT: No specific issues noted.
      • NET_CONNECTION_MANAGER: Tagged as experimental and potentially unreliable.

      Current Status: Seeking clarification on the feasibility of activating all three methods simultaneously. Is it now okay to use all three at the same time?

    2. memfault.c::152-157 Update:

      • Previous Status: Commented out.
      • Current Status: Lines 152-157 have been uncommented. But what is the guidance from your side?

    3. Deactivation of Connection Manager:

      • Previous Inquiry: Potential deactivation due to its experimental tag.
      • Current Status: Need clarity on whether it is activated on your side or it is deactivated.

    4. Recent Test Setup Adjustments:

      • Reduced CONFIG_HEAP_MEM_POOL_SIZE to 140000.
      • Enabled CONFIG_COMMON_LIBC_MALLOC=y.
      • Set CONFIG_COMMON_LIBC_MALLOC_ARENA_SIZE to 32768.
      • Activated CONFIG_L2_WIFI_CONNECTIVITY_AUTO_DOWN=n.
      • Switched log mode to deferred.
      • Uncommented memfault.c::152-157.
      • Added CONFIG_NET_OFFLOAD=y as per diff file.
      • Set CONFIG_NET_SOCKETS_POLL_MAX=16 as per diff file.

    5. Questions and Clarifications:

      • NET_OFFLOAD Impact:
        • We need to understand why NET_OFFLOAD it is necessary in our application.
        • Seeking detailed explanation of its functions and implications. What stack is being used?

    I understand that accuracy is important to you, so please let me know if the information provided above is correct, or if you require any further details or clarifications.
    I'm happy to assist you in any way I can.

    Also, here's the error on our board when trying to do the Wi-Fi provisioning now.
    Was this happening on your R&D team's side?

    We updated the code repository to reflect the state of our modification. 

    uart:~$ *** Booting nRF Connect SDK v2.5.99-dev1 ***
    OK
    [00:00:00.459,777] <inf> fs_nvs: 2 Sectors of 4096 bytes
    [00:00:00.459,838] <inf> fs_nvs: alloc wra: 0, fe8
    [00:00:00.459,838] <inf> fs_nvs: data wra: 0, 0
    [00:00:00.464,691] <inf> mflt: GNU Build ID: d2cbce1f87a24752a90f45cf66d323ead8ef20c5
    [00:00:00.464,965] <inf> mflt: Periodic background upload scheduled - duration=2625s period=3600s
    [00:00:00.465,209] <inf> net_config: Initializing network
    [00:00:00.465,240] <inf> net_config: Waiting interface 1 (0x20001bd0) to be up...
    [00:00:00.465,393] <inf> net_config: IPv4 address: 192.165.100.150
    [00:00:00.465,454] <inf> net_config: Running dhcpv4 client...
    [00:00:00.465,820] <dbg> wpa_supp: wpa_printf_impl: wpa_supplicant v2.11-devel
    [00:00:00.466,217] <inf> wpa_supp: Successfully initialized wpa_supplicant
    [00:00:00.466,766] <dbg> wpa_supp: wpa_printf_impl: Adding interface wlan0
    
    [00:00:00.466,857] <dbg> wpa_supp: wpa_printf_impl: Calling wpa_cli: interface_add, argc: 5
    
    [00:00:00.466,918] <dbg> wpa_supp: wpa_printf_impl: argv[0]: interface_add
    
    [00:00:00.466,949] <dbg> wpa_supp: wpa_printf_impl: argv[1]: wlan0
    
    [00:00:00.467,010] <dbg> wpa_supp: wpa_printf_impl: argv[2]: zephyr
    
    [00:00:00.467,041] <dbg> wpa_supp: wpa_printf_impl: argv[3]: zephyr
    
    [00:00:00.467,102] <dbg> wpa_supp: wpa_printf_impl: argv[4]: zephyr
    
    [00:00:00.467,437] <dbg> wpa_supp: wpa_printf_impl: RX global ctrl_iface - hexdump_ascii(len=50):
    [00:00:00.467,437] <dbg> wpa_supp: _wpa_hexdump_ascii: 
                                       49 4e 54 45 52 46 41 43  45 5f 41 44 44 20 77 6c |INTERFAC E_ADD wl
                                       61 6e 30 09 7a 65 70 68  79 72 09 7a 65 70 68 79 |an0.zeph yr.zephy
                                       72 09 7a 65 70 68 79 72  09 28 6e 75 6c 6c 29 09 |r.zephyr .(null).
                                       09 09                                            |..               
    [00:00:00.467,529] <dbg> wpa_supp: wpa_printf_impl: CTRL_IFACE GLOBAL INTERFACE_ADD 'wlan0      zephyr  zephyr  zephyr  (null)       '
    [00:00:00.467,651] <dbg> wpa_supp: wpa_printf_impl: Initializing interface 'wlan0' conf 'zephyr' driver 'zephyr' ctrl_interface 'zephyr' bridge 'N/A'
    [00:00:00.471,862] <dbg> wpa_supp: wpa_printf_impl: Add interface wlan0 to a new radio N/A
    [00:00:00.475,158] <dbg> wpa_supp: wpa_printf_impl: wpa_supp: Added 802.11b mode based on 802.11g information
    [00:00:00.475,372] <dbg> wpa_supp: wpa_printf_impl: l2_packet_init: iface wlan0 ifindex 1
    [00:00:00.475,555] <dbg> wpa_supp: wpa_printf_impl: wlan0: Own MAC address: f4:ce:36:00:1e:d8
    [00:00:00.475,708] <dbg> wpa_supp: wpa_printf_impl: wlan0: RSN: flushing PMKID list in the driver
    [00:00:00.475,799] <dbg> wpa_supp: wpa_printf_impl: wlan0: State: DISCONNECTED -> INACTIVE
    [00:00:00.476,196] <dbg> wpa_supp: wpa_printf_impl: MBO: Update non-preferred channels, non_pref_chan=N/A
    [00:00:00.476,379] <dbg> wpa_supp: wpa_printf_impl: wlan0: Added interface wlan0
    [00:00:00.476,501] <dbg> wpa_supp: wpa_printf_impl: wlan0: State: INACTIVE -> DISCONNECTED
    [00:00:02.480,468] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
    [00:00:02.480,529] <inf> bt_hci_core: HW Variant: nRF53x (0x0003)
    [00:00:02.480,560] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 161.54902 Build 901303921
    [00:00:02.482,818] <inf> bt_hci_core: Identity: CA:65:64:5F:59:DF (random)
    [00:00:02.482,849] <inf> bt_hci_core: HCI: version 5.4 (0x0d) revision 0x211a, manufacturer 0x0059
    [00:00:02.482,879] <inf> bt_hci_core: LMP: version 5.4 (0x0d) subver 0x211a
    [00:00:02.482,910] <inf> applayer_wifi_prov: Bluetooth initialized.
    
    [00:00:02.482,910] <inf> applayer_wifi_prov: Wi-Fi provisioning service starts successfully.
    
    [00:00:02.485,015] <inf> applayer_wifi_prov: BT Advertising successfully started.
    
    [00:00:02.485,198] <inf> applayer_wifi_prov: Exiting applayer_wifi_prov, the work queue is properly started
    
    [00:00:02.485,260] <inf> applayer_memfault: Memfault sample has started
    [00:00:02.485,321] <inf> applayer_memfault: Bringing network interface up and connecting to the network
    [00:00:02.485,351] <inf> applayer_memfault: nw_connected_sem not available
    uart:~$ BT Connected: 69:A2:A6:8E:32:39 [00:00:51.337,249] <wrn> bt_l2cap: Ignoring data for unknown channel ID 0x003a
    uart:~$ BT pairing completed: 69:A2:A6:8E:32:39 (random), bonded: 0
    BT Security changed: 69:A2:A6:8E:32:39 (random) level 2.
    [00:00:53.467,346] <inf> wifi_prov: Wi-Fi Provisioning service - data out: notifications enabled
    [00:00:53.587,493] <inf> wifi_prov: Wi-Fi Provisioning service - control point: indications enabled
    [00:00:53.707,489] <dbg> wifi_prov: write_prov_control_point: Control point rx: 
                                        08 01                                            |..               
    [00:00:53.707,519] <dbg> wifi_prov: wifi_prov_recv_req: Control point rx: 
                                        08 01                                            |..               
    [00:00:53.707,611] <inf> wifi_prov: Start parsing...
    [00:00:53.707,611] <inf> wifi_prov: GET_STATUS received...
    [00:00:57.322,326] <dbg> wifi_prov: wr: Control point rx: 
     [00:00:57.322,357] <dbg> wifi_prov: wifi_prov_recv_req: Control point rx: 
                                        08 02 52 02 08 00                                |..R...           
    [00:00:57.322,479] <inf> wifi_prov: Start parsing...
    [00:00:57.322,479] <inf> wifi_prov: Start_Scan received...
    [00:00:57.322,570] <dbg> wpa_supp: wpa_printf_impl: Calling wpa_cli: disconnect, argc: 1
    
    [00:00:57.322,601] <dbg> wpa_supp: wpa_printf_impl: argv[0]: disconnect
    
    [00:00:57.323,028] <dbg> wpa_supp: wpa_printf_impl: wlan0: Control interface command 'DISCONNECT'
    [00:00:57.323,181] <dbg> wpa_supp: wpa_printf_impl: wlan0: Cancelling scan request
    [00:00:57.323,669] <dbg> wpa_supp: wpa_printf_impl: wlan0: Request to deauthenticate - bssid=00:00:00:00:00:00 pending_bssid=00:00:00:00:00:00 reason=3 (DEAUTH_LEAVING) state=DISCONNECTED
    [00:00:57.323,822] <dbg> wpa_supp: wpa_printf_impl: wlan0: State: DISCONNECTED -> DISCONNECTED
    [00:00:57.335,327] <dbg> wpa_supp: wpa_printf_impl: QM: Clear all active DSCP policies
    [00:00:57.335,418] <inf> wpa_supp: wlan0: CTRL-EVENT-DSCP-POLICY clear_all
    [00:00:57.454,406] <err> os: ***** USAGE FAULT *****
    [00:00:57.462,951] <err> os:   Stack overflow (context area not valid)
    [00:00:57.473,052] <err> os: r0/a1:  0x200001e8  r1/a2:  0x200003a8  r2/a3:  0x000d1114
    [00:00:57.484,649] <err> os: r3/a4:  0x0000000a r12/ip:  0x01010101 r14/lr:  0x00065f0d
    [00:00:57.496,215] <err> os:  xpsr:  0x61000200
    [00:00:57.504,333] <err> os: r4/v1:  0x20035870  r5/v2:  0x000d1d63  r6/v3:  0x00000000
    [00:00:57.515,930] <err> os: r7/v4:  0x20035854  r8/v5:  0x2003586c  r9/v6:  0x00000001
    [00:00:57.527,496] <err> os: r10/v7: 0x00000004  r11/v8: 0x00000014    psp:  0x20035808
    [00:00:57.539,093] <err> os: EXC_RETURN: 0xfffffffd
    [00:00:57.547,546] <err> os: Faulting instruction address (r15/pc): 0x0006d0fa

  • Hi,

     

    My apologies, this is due to the CONFIG_COMMON_LIBC_MALLOC_ARENA_SIZE in this specific configuration. It needs to be higher. I have asked R&D for what the min. value should be here, and requested that this would be added to our documentation. I got your current code, with all overlay configs, running by setting CONFIG_COMMON_LIBC_MALLOC_ARENA_SIZE to '37000' bytes.

    My former suggestions on for instance CONFIG_NET_OFFLOAD=y and disabling NET_CONNECTION_MANAGER is not correct. It was a incorrect amount of RAM allocated to the libc heap that was the root-cause, and changing the local configuration (ie. reducing memory) allowed libc heap to use more RAM, as the default configuration is to use all available RAM.

    Actif said:
    • Added CONFIG_NET_OFFLOAD=y as per diff file.

    As the original issue is rooted in memory allocation issues, you can revert this. I tried several settings based on other samples, and offloading the net stack in zephyr is not required for your use-case.

    Actif said:
    Current Status: Lines 152-157 have been uncommented. But what is the guidance from your side?

    Keep this as original, ie. call conn_mgr_* API calls as needed by the module.

    Kind regards,

    Håkon

Related