NRF5340 HCI_USB & HCI_UART Stop Scanning

I am developing a custom nrf5340 board design, that is using the BL5340PA external antenna module to implement the nrf5340 chip in the design. The BL5340 module allows all of the base functionality in the nrf5340 module, but also adds an external FEM in increase the TX power. I am utilizing the Zephyr/nrf SDK version 2.4.1 (received from Laird for updated TX power setting rules: GitHub - LairdCP/bl5340pa_manifest: Manifest for the Laird Connectivity fork of the nRF Connect SDK with support for the BL5340PA). I am attempting to utilize either the HCI_USB or HCI_UART firmware projects, that come with Zephyr, in order to make this custom board a reliable BLE adapter for Linux. My Linux device is running off of kernel version 5.4.0-174-generic and is connected to the custom board through the NRF USB interface on the nrf5340. My BlueZ is version 5.66. The below actions are what I am trying to accomplish:

  1. Reliable BLE Adapter over USB or CDC-ACM USB
  2. Application Core Firmware Update over CDC-ACM USB
    1. Currently using MCUMGR
  3. Network Core Firmware Update over CDC-ACM USB
    1. Not functional, as no external flash on custom board. Need method to use internal flash

The items above I would like to complete in that priority as well. Therefore I will focus on the BLE Adapter functionality with mentions to the others. 

My Issue:

  • Utilizing the HCI_USB or HCI_USB project, HCI_UART is sent through the CDC-ACM driver and setup with BTATTACH (HCIATTACH tried too, same effect, and it is deprecated now) in Linux as H4 protocol with 1000000 speed, whereas HCI_USB is automatically recognized by BTUSB driver in Linux and setup. I can communicate, connect, pair, and perform all necessary BT actions between both of these projects (this has been tested using BluetoothCTL and HCI_BUS commands in C++ code). After some time (HCI_UART, few minutes to an hour. HCI_USB, hour to two hours) of Scanning for devices the adapter will stop functioning, communication will halt on a timeout (This is shown in the image below, as output from DMESG, below that is what appears in BTMON). In some Debug Messages from the custom board I have received a "<wrn> bt_hci_driver: Couldn't allocate a buffer after waiting 10 seconds." error when this occurs. After this error occurs there are a few different methods of resetting the board that work to recover and regain BLE functionality with BlueZ. 
    • Recovery Methods
      • HCI_USB
        • bluetoothctl.power off -> bluetoothctl.power on
        • Or any hard power reset of board
          • Details: Default Zephyr Project
      • HCI_UART
        • Hard power reset of board
          • Details: Default Zephyr Project, with addition of CDC-ACM configuration for bt-c2h-uart redirection. 
      • HCI_USB with MCUMGR
        • MCUMGR reset
        • or any hard power reset of board
          • Details:
            • This project is implemented with a change in the SDK to allow the CDC-ACM interface to always be second, allowing HCI_USB to use the first interface and BTUSB to recognize this. Both USB interfaces are active at the same time, as seen when plugging into Windows, but Linux only allows one to be active as the BTUSB and CDC drivers conflict on assignment. Meaning that in order to switch the USB interface of the board has to be unbinded.  
      • HCI_UART with MCUMGR
        • MCUMGR reset
        • or any hard power reset of board
          • Details:
            • 2 CDC-ACM ports, one for MCUMGR and one for HCI_UART

I have seen this issue across every attempt with these firmware projects, and any configuration I try with them. Eventually the board will stop functioning as a BT adapter and some, possibly extensive, actions have to be taken to recover it. From my investigation, this seems to be, on the surface, a memory or issue in the speed of communication. As the issue occurs faster when there are many more Bluetooth devices available to be scanned. I tried using the same testing methods with another Linux BT Adapter (Intel AX210) and they did not occur, which let me know that this is an issue within my Zephyr firmware. This issue seems to occur faster with the HCI_UART version of the firmware, as opposed to HCI_USB.

I have researched this issue everywhere and cannot find a good solution or any solution to resolve this. Below are some links that I have gone to find any solution:

https://lists.zephyrproject.org/g/devel/topic/hci_interface_stopped_working/71746540?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,640,71746540

https://github.com/zephyrproject-rtos/zephyr/issues/20250

https://github.com/zephyrproject-rtos/zephyr/issues/37731

Code from my HCI_UART_MCUMGR project:

-  prj.conf

"prj.conf"

#################################################################################
#
#  Custom-Board Project Configuration 
#
#################################################################################


##########################################################################
## HCI_UART Project Configuration --------------------------------------
# --- Sets up the UART interface for HCI control of the BT Carrier Board
# -- From default HCI_UART project with few removals due to conflicts

CONFIG_STDOUT_CONSOLE=n
CONFIG_UART_CONSOLE=n
CONFIG_GPIO=y
CONFIG_SERIAL=y
CONFIG_UART_INTERRUPT_DRIVEN=y
CONFIG_BT=y
CONFIG_BT_HCI_RAW=y
CONFIG_BT_HCI_RAW_H4=y
CONFIG_BT_HCI_RAW_H4_ENABLE=y
CONFIG_BT_BUF_ACL_RX_SIZE=255
CONFIG_BT_BUF_CMD_TX_SIZE=255
CONFIG_BT_BUF_EVT_DISCARDABLE_SIZE=255
CONFIG_BT_MAX_CONN=16
CONFIG_BT_TINYCRYPT_ECC=n

# Workaround: Unable to allocate command buffer when using K_NO_WAIT since
# Host number of completed commands does not follow normal flow control.
CONFIG_BT_BUF_CMD_TX_COUNT=10

#=========================================================================

##########################################################################
## HCI -> USB Project Configuration --------------------------------------
# --- Sets up the USB interface to be used for communication
# --- USB Interface is used for MCUMGR and HCI_UART
# --- Composite USB configuration allowing multiple USB interfaces

CONFIG_USB_DEVICE_STACK=y
CONFIG_USB_DEVICE_PRODUCT="Custom-Board"

#CONFIG_USB_DEVICE_PID=
#CONFIG_USB_DEVICE_VID=

CONFIG_USB_CDC_ACM=y
CONFIG_USB_DEVICE_INITIALIZE_AT_BOOT=n
CONFIG_UART_LINE_CTRL=y

#=========================================================================

##########################################################################
## MCUMGR/MCUBOOT Project Configuration --------------------------------------
# --- Enables MCUMGR that is used for Image Management
# --- Enables MCUBOOT bootloader to handle the images on the carrier board, as well as booting

# Enable MCUmgr and dependencies.
CONFIG_NET_BUF=y
CONFIG_ZCBOR=y
CONFIG_CRC=y
CONFIG_MCUMGR=y
CONFIG_STREAM_FLASH=y
CONFIG_FLASH_MAP=y

# Some command handlers require a large stack.
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2304
CONFIG_MAIN_STACK_SIZE=2048

# Ensure an MCUboot-compatible binary is generated.
CONFIG_BOOTLOADER_MCUBOOT=y

# Enable flash operations.
CONFIG_FLASH=y

# Required by the `taskstat` command.
CONFIG_THREAD_MONITOR=y

# Support for taskstat command
CONFIG_MCUMGR_GRP_OS_TASKSTAT=y

# Enable statistics and statistic names.
CONFIG_STATS=y
CONFIG_STATS_NAMES=y

# Enable most core commands.
CONFIG_FLASH=y
CONFIG_IMG_MANAGER=y
CONFIG_MCUMGR_GRP_IMG=y
CONFIG_MCUMGR_GRP_OS=y
CONFIG_MCUMGR_GRP_STAT=y

# Enable logging
CONFIG_LOG=y
CONFIG_MCUBOOT_UTIL_LOG_LEVEL_WRN=y

# Disable debug logging
CONFIG_LOG_MAX_LEVEL=3

CONFIG_MCUMGR_TRANSPORT_UART=y
CONFIG_BASE64=y

#=========================================================================

- mcuboot.conf

"mcuboot.conf"

#################################################################################
#
#  MCUBOOT Child Image Configuration 
#
#################################################################################


#--------------------------------------------------------------------------------
# Enable Pin Control
#--------------------------------------------------------------------------------
CONFIG_PINCTRL=y


#--------------------------------------------------------------------------------
# Enable code size optimization on the compiler
#--------------------------------------------------------------------------------
CONFIG_SIZE_OPTIMIZATIONS=y


#--------------------------------------------------------------------------------
# Enable multi threading.
#--------------------------------------------------------------------------------
CONFIG_MULTITHREADING=y


#--------------------------------------------------------------------------------
# Add private key for MCUboot. Refer to these sites for more information:
# https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrf/app_dev/bootloaders_and_dfu/bootloader_adding.html#id11
# https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrf/app_dev/bootloaders_and_dfu/fw_update.html#ug-fw-update-keys-python
#--------------------------------------------------------------------------------
CONFIG_BOOT_SIGNATURE_KEY_FILE="custom_file_and_path_here.pem"

- hci_rpmsg.conf (to setup FEM)

"hci_rpmsg.conf"

#################################################################################
#
#  HCI_RPMSP Child Image Configuration 
#
#################################################################################

#--------------------------------------------------------------------------------
# Enable Pin Control
#--------------------------------------------------------------------------------
CONFIG_PINCTRL=y


#--------------------------------------------------------------------------------
# Enable SPI
#--------------------------------------------------------------------------------

# Enabled for FEM control
CONFIG_SPI=y


#--------------------------------------------------------------------------------
# Enable MPSL and FEM
#--------------------------------------------------------------------------------
CONFIG_MPSL=y
CONFIG_MPSL_FEM=y
CONFIG_MPSL_FEM_NRF21540_GPIO=y
CONFIG_MPSL_FEM_NRF21540_TX_GAIN_DB=20
CONFIG_BT_CTLR_TX_PWR_ANTENNA=20
CONFIG_MPSL_FEM_NRF21540_GPIO_SPI=y

- bl5340pa_dvk_cpuapp.overlay (I have custom board files made for my board, but for ease of testing it also works with the bl5340pa DVK default board files. Both configurations give the same results)

"bl5340pa_dvk_cpuapp.overlay"

/ {
	chosen {
	   zephyr,uart-mcumgr = &cdc_acm1;
	   zephyr,bt-c2h-uart = &cdc_acm0;
	};
 };
 

&zephyr_udc0 {
	cdc_acm0: cdc_acm0 {
		compatible = "zephyr,cdc-acm-uart";
	};
	cdc_acm1: cdc_acm1 {
		compatible = "zephyr,cdc-acm-uart";
	};
};

&uart0 {
	current-speed = <1000000>;
	status = "disabled";
	hw-flow-control;
};

 

The Main.C src file of HCI_USB and HCI_UART is unedited from either of the examples.

As a note on the above, this has also been tested without any of the FEM or MCUBOOT/MCUMGR configurations to ensure that is not affecting the projects. Same effect observed.

Questions:

  1. Is there something that I am missing that is causing this issue?
    1. On the Zephyr Configuration side?
    2. On the Linux/BlueZ setup side?
  2. Is there another method to perform my desired actions that would avoid these issues?
  3. Is this the right place to investigate this?
  4. Or any other advice would be greatly appreciated.

Thank you to whomever can help with this issue, as it has been causing me a lot of troubles. Let me know if I can supply any addition information that would assist in resolving this. 

  •    Thanks for your response. I have ran these tests a few times on the custom board, BL5340PA DK, BL5340 DK, but my nrf5340DK was acting up so I can give that another try as well. For a sanity check for me and others supporting this issue, I will run each of these tests today, but they will still all need the cdc-acm modification made to the projects to utilize the USB interface. 

    I will create fresh copies of the 2.4.1 fork and stock version of the SDK to ensure nothing else has been changed.

  • HCI_BLE_TEST.zip

    An update on the tests performed today. I tested HCI_USB, and HCI_UART example projects on the BL5340DK, BL5340PADK, and nrf5340DK on a fresh copy of the laird fork of the 2.4.1 SDK. Below were my findings per device per project. Note that the max runtime I allowed for these was an hour before I cut it off without failure. I have also attached a .zip of all the projects that I used for these tests. They are named accordingly with the example they were taken from, and have the built build directories for the boards that were tested with those projects, if looking through the generated configs is desired. 

    Time Run: Amount of time ran without issue, if includes "+" means that failure did not occur

    Devices Scanned: Count of devices listed in the Bluetoothctl "devices" command after scanning was complete

    HCI_USB_NO_PA (no modification)

    - BL5340DK: Time Ran: 1+ hours, Devices Scanned: 1259

    - BL5340PADK: Time Ran: 2.5min, 30seconds, Devices Scanned(30 sec): 114
    - nrf5340DK: Time Ran: 1+ hours, Devices Scanned: 901

    HCI_USB_WITH_PA (Added child_image/hci_rpmsg.conf with MPSL PA config)

    - BL5340PADK: Time Ran: 45 sec, 20 sec, 30 sec Devices Scanned(30 sec): 134

    HCI_UART_NO_PA (Added CDC-ACM redirection)

    - BL5340DK: Time Ran: 1+hours Devices Scanned: 1520

    - BL5340PADK: Time Ran: 1min, 1.5min, Devices Scanned(1min): 107

    - nrf5340DK: Time Ran: 1+hours, Devices Scanned: 902

    HCI_UART_WITH_PA (Added CDC-ACM redirection, and child_image/hci_rpmsg.conf)

    - BL5340PADK: Time Ran: 45 sec, 45 sec, Devices Scanned: 156

    I did not run this test with the original 2.4.1 Zephyr/nrf SDK as I did not have any device fail in these hour long tests besides the BL5340PADK. I have had the same failure on the BL5340DK without the PA before, but I will be running that test tonight to give it more time to fail and to confirm this. I did not find it useful testing the BL5340PADK on the original 2.4.1 zephyr/nrf SDK as scanning is really slow/inoperable due to not having the configuration to deal with the FEM/External antenna. 

    I used the DK's only as I have seen the same behavior on the BL5340PADK and the custom board and this should be easier to test for others following along. 

    No other changes were made in these projects besides what is listed next to the project name, but as we move forward we can edit these further to test. And I am able to add segger RTT logs for netcore and either CDC-ACM or segger logs for the app core if desired.


    As a reminder this same error is the error that is occurring across HCI_USB and HCI_UART.


    Thanks!

  • Thanks a lot for taking the time to do these tests. We will look into your findings and get back to you.

  • No problem, thank you.

    This morning my BL5340DK on HCI_USB default project is still running with no issues. This has been running for about 16 hours, in my very busy BT environment with only a single "scan on" request to start and continue running. The DK had a little under 7000 mac addresses stored, from devices scanned, in the "bluetoothctl devices" command.


    The above logs are from dmesg on the Linux device running the DK, as these are really all I can retrieve with the current project settings. I see the normal "advertising data len corrected" that seems to appear on any BT adapter with enough runtime, and the "bt_err_ratelimited" which I have seen on the functional tests of the HCI projects. 



    Since the failure I saw before was on the BL5340DK with HCI_UART, but it had more features enabled, I will run a long run test with the Laird SDK and the default HCI_UART today to see if I can catch any failure to compare to the normal Zephyr/nrf 2.4.1 SDK. 

  • Default project HCI_UART with the BL5340DK ran without issues for 23 hours of continuous scanning as well on the laird fork of the SDK. Thanks!

Related