Error while blending two Nordic samples – the WiFi Provisioning sample and the Memfault sample

Hello Nordic,

I hope this message finds you well. I wanted to reach out to you all for some guidance regarding a challenge I'm facing with my nRF7002DK board. I'm relatively new to Zephyr, and I think that's where I'm hitting a snag.

I've been working on merging two Nordic samples - the WiFi Provisioning sample and the Memfault sample. Each works fine on its own, but I'm encountering issues when I try to combine them into a single project. Specifically, I'm attempting to replace the WiFi static credentials with Bluetooth for provisioning, but this leads to the provisioning service failing to connect to the WiFi network.

You can find my project here: Provisioning-memfault on GitHub.
I would appreciate it if you could try to run it and provide any insights, advice, or pointers you can share.


What I found was that I was able to retrieve the SSID of the Wi-Fi and the passkey.
I ended up in this file: \ncs\v2.5.99-dev1\nrf\subsys\bluetooth\services\wifi_prov\wifi_prov_handler.c.

The error happens inside this function: static void prov_set_config_handler(Request *req, Response *rsp).
This particular code blob is where the error occurs. In our case, we end up in error path number 4.
I don't understand why the network management layer suddenly fails to connect when we integrate memfault.
I can't pinpoint what has changed in the configuration.

	rc = net_mgmt(NET_REQUEST_WIFI_CONNECT, iface, &cnx_params,
		      sizeof(struct wifi_connect_req_params));
	/* Invalid argument error. */
	if (rc == -EINVAL) {
		rsp->has_status = true;
		rsp->status = Status_INVALID_ARGUMENT;
		LOG_ERR("		3                  ");
		return;
	}
	/* Other error. */
	if (rc != 0) {
		rsp->has_status = true;
		rsp->status = Status_INTERNAL_ERROR;
		LOG_ERR("		4                  ");
		return;
	}

  • Hi Vincent,

     

    Here's the diff of my working provisioning sample:

    diff --git a/boards/nrf7002dk_nrf5340_cpuapp.conf b/boards/nrf7002dk_nrf5340_cpuapp.conf
    index 2f78d3f..042c9f2 100644
    --- a/boards/nrf7002dk_nrf5340_cpuapp.conf
    +++ b/boards/nrf7002dk_nrf5340_cpuapp.conf
    @@ -11,8 +11,6 @@ CONFIG_ENTROPY_GENERATOR=y
     
     # Activated with the debug overlay
     CONFIG_SHELL=n
    -CONFIG_CONSOLE=n
    -CONFIG_LOG=n
     
     # Heap and stacks
     CONFIG_INIT_STACKS=y
    @@ -34,7 +32,7 @@ CONFIG_NET_MGMT_EVENT_STACK_SIZE=4096
     CONFIG_NET_MGMT_EVENT=y
     CONFIG_NET_MGMT_EVENT_INFO=y
     CONFIG_WIFI_MGMT_EXT=y
    -CONFIG_NET_CONNECTION_MANAGER=y
    +CONFIG_NET_CONNECTION_MANAGER=n
     
     # Networking
     CONFIG_NETWORKING=y
    @@ -105,11 +103,11 @@ CONFIG_MBEDTLS_SSL_SERVER_NAME_INDICATION=y
     # The sample enables flash storage for coredumps, which depend on the parition manager.
     # By default, nRF7002DK builds are for a single image, so the partition manager must be enabled
     # also for single image builds.
    -CONFIG_PM_SINGLE_IMAGE=y
    +#CONFIG_PM_SINGLE_IMAGE=y
     
     # Zephyr NET Connection Manager Connectivity layer.
     CONFIG_L2_WIFI_CONNECTIVITY=y
     CONFIG_L2_WIFI_CONNECTIVITY_AUTO_CONNECT=n
    -# CONFIG_L2_WIFI_CONNECTIVITY_AUTO_DOWN=n
    +CONFIG_L2_WIFI_CONNECTIVITY_AUTO_DOWN=n
     
     CONFIG_NET_CONNECTION_MANAGER_MONITOR_STACK_SIZE=4096
    diff --git a/prj.conf b/prj.conf
    index 404a4c9..9c1f5f8 100644
    --- a/prj.conf
    +++ b/prj.conf
    @@ -5,4 +5,71 @@
     CONFIG_POSIX_MAX_FDS=9
     CONFIG_NET_CONFIG_AUTO_INIT=y
     
    -# Networking Management AP
    \ No newline at end of file
    +# Networking Management AP
    +CONFIG_MEMFAULT=y
    +CONFIG_NET_NATIVE=y
    +
    +# CONFIG_FPU=y
    +
    +CONFIG_MEMFAULT_ROOT_CERT_STORAGE_TLS_CREDENTIAL_STORAGE=y
    +CONFIG_MEMFAULT_NCS_PROVISION_CERTIFICATES=y
    +
    +CONFIG_MEMFAULT_NCS_PROJECT_KEY="aT8m6B52VflxR0iFqVNy7I2a6e5abMfh"
    +CONFIG_MEMFAULT_NCS_STACK_METRICS=y
    +CONFIG_MEMFAULT_HEAP_STATS=y
    +
    +
    +CONFIG_MEMFAULT_HTTP_ENABLE=y
    +CONFIG_MEMFAULT_HTTP_PERIODIC_UPLOAD=y
    +
    +CONFIG_MEMFAULT_LOGGING_ENABLE=y
    +
    +# Store coredump to flash
    +CONFIG_MEMFAULT_NCS_INTERNAL_FLASH_BACKED_COREDUMP=y
    +CONFIG_MEMFAULT_COREDUMP_COLLECT_BSS_REGIONS=y
    +
    +# Dependencies for flash storage
    +CONFIG_FLASH=y
    +CONFIG_FLASH_MAP=y
    +CONFIG_STREAM_FLASH=y
    +
    +# Increase the event storage size so that all metrics generated by the application
    +# are reliably sent to the memfault cloud.
    +CONFIG_MEMFAULT_EVENT_STORAGE_SIZE=2048
    +# ************ Driver & Subsystem  ************ #
    +CONFIG_BT=y
    +CONFIG_NANOPB=y
    +# ************ End of Driver & Subsystem ************ #
    +
    +# ************ Bluetooth  ************
    +CONFIG_BT_SMP=y
    +CONFIG_BT_PERIPHERAL=y
    +
    +CONFIG_BT_BUF_ACL_RX_SIZE=151
    +CONFIG_BT_L2CAP_TX_MTU=147
    +CONFIG_BT_BUF_ACL_TX_SIZE=151
    +
    +CONFIG_BT_RX_STACK_SIZE=4096
    +CONFIG_BT_BONDABLE=n
    +CONFIG_BT_DEVICE_NAME_DYNAMIC=y
    +
    +CONFIG_BT_WIFI_PROV=y
    +
    +# Setting BT supervision timeout to 75units (750ms) to avoid timeout of BT connection when radio is granted to WiFi during scan.
    +CONFIG_BT_PERIPHERAL_PREF_TIMEOUT=75
    +# ************ End of Bluetooth ************
    +
    +# Networking
    +
    +CONFIG_NET_SOCKETS_POSIX_NAMES=y
    +CONFIG_NET_CONFIG_SETTINGS=y
    +CONFIG_NET_CONFIG_MY_IPV4_ADDR="192.165.100.150"
    +CONFIG_NET_CONFIG_PEER_IPV4_ADDR="192.165.100.1"
    +
    +# Similar to shell sample, add this option to ensure the event can get served.
    +CONFIG_NET_MGMT_EVENT_QUEUE_TIMEOUT=5000
    +CONFIG_NET_CONFIG_INIT_TIMEOUT=0 
    +
    +CONFIG_NET_OFFLOAD=y
    +CONFIG_NET_SOCKETS_SOCKOPT_TLS=y
    +CONFIG_NET_SOCKETS_POLL_MAX=16
    diff --git a/src/memfault/memfault.c b/src/memfault/memfault.c
    index 9daf4f1..59a9ac5 100644
    --- a/src/memfault/memfault.c
    +++ b/src/memfault/memfault.c
    @@ -150,11 +150,6 @@ void start_memfault(void)
          */
         LOG_INF("Bringing network interface up and connecting to the network");
     
    -    if (conn_mgr_all_if_up(true))
    -    {
    -        __ASSERT(false, "conn_mgr_all_if_up, error");
    -        return;
    -    }
     
         /* Performing in an infinite loop to be resilient against
          * re-connect bursts directly after boot, e.g. when connected
    

     

    Sorry for putting everything into the prj.conf file.

    Here's the serial output:

    *** Booting nRF Connect SDK v2.5.0 ***
    [00:00:00.451,110] <inf> mflt: GNU Build ID: 7784cb23af9dcf4f83aaea269ff677dbc1d5bde5
    [00:00:00.451,324] <inf> mflt: Periodic background upload scheduled - duration=296s period=3600s
    OK
    [00:00:02.464,965] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
    [00:00:02.464,996] <inf> bt_hci_core: HW Variant: nRF53x (0x0003)
    [00:00:02.465,026] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 197.47763 Build 2370639017
    [00:00:02.467,102] <inf> bt_hci_core: Identity: CA:C4:23:90:3B:06 (random)
    [00:00:02.467,132] <inf> bt_hci_core: HCI: version 5.4 (0x0d) revision 0x2102, manufacturer 0x0059
    [00:00:02.467,163] <inf> bt_hci_core: LMP: version 5.4 (0x0d) subver 0x2102
    [00:00:02.467,163] <inf> applayer_wifi_prov: Bluetooth initialized.
    
    [00:00:02.467,193] <inf> applayer_wifi_prov: Wi-Fi provisioning service starts successfully.
    
    [00:00:02.469,238] <inf> applayer_wifi_prov: BT Advertising successfully started.
    
    [00:00:02.470,306] <inf> applayer_wifi_prov: Exiting applayer_wifi_prov, the work queue is properly started
    
    [00:00:02.470,336] <inf> applayer_memfault: Memfault sample has started
    [00:00:02.470,397] <inf> applayer_memfault: Bringing network interface up and connecting to the network
    [00:00:02.470,428] <inf> applayer_memfault: nw_connected_sem not available
    BT Connected: 58:15:6F:DB:6D:A5 (random)BT pairing completed: 58:15:6F:DB:6D:A5 (random), bonded: 0
    BT Security changed: 58:15:6F:DB:6D:A5 (random) level 2.
    [00:00:22.046,630] <inf> wifi_prov: Wi-Fi Provisioning service - control point: indications enabled
    [00:00:22.069,183] <inf> wifi_prov: Wi-Fi Provisioning service - data out: notifications enabled
    [00:00:22.215,606] <inf> wifi_prov: Start parsing...
    [00:00:22.215,606] <inf> wifi_prov: GET_STATUS received...
    OK
    [00:00:24.165,435] <inf> wifi_prov: Start parsing...
    [00:00:24.165,466] <inf> wifi_prov: Start_Scan received...
    [00:00:30.941,711] <inf> wifi_prov: Start parsing...
    [00:00:30.941,741] <inf> wifi_prov: Stop_Scan received...
    OK
    OK
    OK
    OK
    OK
    OK
    OK
    OK
    OK
    OK
    OK
    [00:00:45.519,042] <inf> wifi_prov: Start parsing...
    [00:00:45.519,073] <inf> wifi_prov: Set_config received...
    rc: 0
    [00:01:42.470,550] <inf> applayer_memfault: nw_connected_sem not available
    [00:03:22.470,672] <inf> applayer_memfault: nw_connected_sem not available

     

    Actif said:
    Can you tell me if there are any other file descriptors (FDs) that could be monitored by a poll() or select() function and potentially exceed the limit of 3?

    Several are in-use by the nrf700x wifi driver, which is why the application needs to set this to a higher number. 

    Actif said:
    Is there any other impact for CONFIG_NET_CONFIG_INIT_TIMEOUT=0, other than the waiting time?

    No, not that I'm aware of. IF is started after this net config init.

     

    Kind regards,

    Håkon

  • Hi Håkon,

    Firstly, I want to express my gratitude for your prompt and detailed responses today. Your quick turnaround is highly appreciated and has been extremely helpful in our troubleshooting efforts.

    After reviewing your suggested modifications, I've successfully replicated the output on our end. However, I do have several questions and clarifications that I hope you can assist with:

    1. Removal of debug.conf Overlay: I noticed that the overlay debug.conf file was removed in your solution. Was there a specific configuration in this file that was causing issues? Understanding this will help us avoid similar problems in the future and add documentation.

    2. Configuration Changes: Could you provide some insight into the rationale behind the specific configurations you activated or deactivated? For project documentation and upcoming PRs, I need to justify these changes comprehensively.

    3. Provisioning and Network Connection Management: It appears that the provisioning part is now functioning, although I haven’t yet tested it with WiFi communication. In this context, I’m curious about the impact of deactivating net_connection_manager. Specifically, is the l4_event_handler function no longer called? I had used Zephyr's documentation for this implementation, so any clarification would be beneficial.

    4. Provisioning Sample Implementation: With the current changes, it seems we can't use the network connection manager with the provisioning sample. Do you consider this a flaw in the sample? What alternative solutions would you suggest? Also, was this an expected limitation that has just come to light, or is it a new discovery for Nordic?

    5. Guidance for Team Update: Lastly, I will need some guidance on how to properly update my team lead about the impact of these changes, especially in terms of future development and risk assessment.

    Thank you once again for your ongoing support.
    Your insights on these points will be invaluable in guiding our next steps.

    Best regards,
    Vincent

  • Hi Vincent,

     

    Actif said:
    Removal of debug.conf Overlay: I noticed that the overlay debug.conf file was removed in your solution. Was there a specific configuration in this file that was causing issues? Understanding this will help us avoid similar problems in the future and add documentation.

    My apologies, this was not intentional from my side.

    I tried adding those earlier today, and found issues with the LOG configuration, specifically this:

    https://github.com/morinv-actif/Provisioning-memfault/blob/main/configs/overlay-debug.conf#L28

    This will cause in-place logging as compared to the default deferred logging, where the prints are handled in a dedicated thread.

    deferred logging is always recommended, to ensure that ISRs and other timing critical components are not skewed in time.

     

    That being said, I had more issues when enabling CONFIG_SHELL, as this seems to provide an invalid socket access print-out during boot-up, and provide similar behavior as you initially saw (wifi scan was successful, but returned ENOTSUP, -134, when running wifi_connect).

      

    I will add this information to the internal bug report that I initially created when running your sample.

    Actif said:
    Configuration Changes: Could you provide some insight into the rationale behind the specific configurations you activated or deactivated? For project documentation and upcoming PRs, I need to justify these changes comprehensively.

    The changes that I did were disabling the connection manager and increasing the amount of sockets in the zephyr socket subsys. Is there any specific configs that you are thinking about?

     

    Actif said:
    • Provisioning and Network Connection Management: It appears that the provisioning part is now functioning, although I haven’t yet tested it with WiFi communication. In this context, I’m curious about the impact of deactivating net_connection_manager. Specifically, is the l4_event_handler function no longer called? I had used Zephyr's documentation for this implementation, so any clarification would be beneficial.

    • Provisioning Sample Implementation: With the current changes, it seems we can't use the network connection manager with the provisioning sample. Do you consider this a flaw in the sample? What alternative solutions would you suggest? Also, was this an expected limitation that has just come to light, or is it a new discovery for Nordic?

    With all the configurations in your combined example enabled, there is 3 ways of controlling the net stack:

    * NET_SHELL

    * NET_MGMT

    * NET_CONNECTION_MANAGER

    Initial indications point towards there being a problem when all these are enabled, where the access or control over the interface itself is problematic, as seen from previous comments related to "invalid sock access" during boot-up, for instance.

    The net connection manager is tagged with experimental (https://docs.zephyrproject.org/3.5.0/kconfig.html#CONFIG_NET_CONNECTION_MANAGER), meaning that the feature shall be used for development purposes, and can be subject to larger change, as described here:

    https://developer.nordicsemi.com/nRF_Connect_SDK/doc/2.5.1/nrf/releases_and_maturity/software_maturity.html#software-maturity-categories

    This also means that it unfortunately cannot be considered stable, and I have reported your findings back to R&D, but haven't heard anything back yet.

     

    Kind regards,

    Håkon

  • Thank you Håkon.

    I really appreciate that you are providing a lot of information to ensure I don't fall behind .

    On our side, we will get back to reading the documentation on NET_SHELL, NET_MGMT and NET_CONNECTION_MANAGER to ensure we are able to understand the feedback from your R&D team. 

    I will wait from your reply for more information. 

    Have a great day.

    Vincent 

  • Good Morning Håkon,

    Is there any follow-up on this problem issue? 

    Regards,
    Vincent

Related