BT HCI Core fault disappears when I set log level to debug.

I'm using the nRF9160DK as a BLE-to-LTE bridge, and I have the onboard 52840 configured as my Bluetooth controller. The 9160DK advertises as a peripheral and waits for a connection from a different board, which automatically connects. If they connect and disconnect even once, I get the following error:

00> [00:05:21.990,570] <inf> tracing: Connected: C2:5E:E8:30:50:1C (random)...
00> [00:05:21.998,229] <inf> tracing: ADV CONNECTED!
00> [00:07:17.994,842] <err> tracing: Disconnected, reason 0x08 
00> 
00> [00:07:18.002,105] <err> bt_hci_core: OpCode 0x0001 completed instead of expected 0x2039
00> ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:430
00>   Controller unresponsive, command opcode 0x2039 timeout with err -11
00> [00:07:38.015,686] <err> os: r0/a1:  0x00000003  r1/a2:  0x20011e00  r2/a3:  0x00000000
00> [00:07:38.024,749] <err> os: r3/a4:  0x00000003 r12/ip:  0x00000000 r14/lr:  0x0000f02b
00> [00:07:38.033,813] <err> os:  xpsr:  0x01000000
00> [00:07:38.039,306] <err> os: s[ 0]:  0xffffffff  s[ 1]:  0xffffffff  s[ 2]:  0x2001bd44  s[ 3]:  0x00000103
00> [00:07:38.050,292] <err> os: s[ 4]:  0x2001bce8  s[ 5]:  0x2001bd7c  s[ 6]:  0x2001bcc8  s[ 7]:  0x2001bcc8
00> [00:07:38.061,248] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x00000001  s[10]:  0x2001bcd8  s[11]:  0x2001bcd8
00> [00:07:38.072,235] <err> os: s[12]:  0x00000000  s[13]:  0x2001bd7c  s[14]:  0x2001d4b4  s[15]:  0x2001d4a8
00> [00:07:38.083,160] <err> os: fpscr:  0x00000001
00> [00:07:38.088,623] <err> os: Faulting instruction address (r15/pc): 0x0000f03a
00> [00:07:38.096,862] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
00> [00:07:38.104,858] <err> os: Current thread: 0x2000f4a8 (sysworkq)
00> [00:07:38.112,030] [1;31m<err> os: Halting system[0m

After I saw this, I dug into hci_core.c. It appears a 10-second wait for a response is timing out on line 430. I tried changing that timeout to 20 seconds, and it still crashed.

Then I tried increasing the BT_RX_WQ stack size. That had no effect, and thread analyzer shows it at 20% stack usage.

Next I enabled the debug logs. (CONFIG_BT_HCI_CORE_LOG_LEVEL_DBG=y) This logs a lot of information to the console, and I have log mode set to immediate. I think it probably slowed the execution speed down a bit. It has not crashed a single time with debug logs enabled, even though I've tested for a while.

This smells like a timing issue to me. Has anyone run into this before? The only result Google had for that error message was unrelated, I think. It was something about the system workqueue being interrupted for too long.

Happy to provide more info if you need it. My prj.conf is below.

# Imported from zephyr/samples/net/mqtt_publisher/prj.conf

#--- SECURITY ---#
CONFIG_BOOTLOADER_MCUBOOT=y
CONFIG_BUILD_WITH_TFM=y
CONFIG_TRUSTED_EXECUTION_NONSECURE=y
CONFIG_HW_STACK_PROTECTION=y

#--- I/O ---#
# CONFIG_CONSOLE=y
# CONFIG_STDOUT_CONSOLE=y
# CONFIG_PRINTK=y
CONFIG_SERIAL=y

#--- KERNEL OPTIONS ---#
CONFIG_POSIX_API=y
# Maximum no. of file descriptors.
CONFIG_POSIX_MAX_FDS=32
CONFIG_ZVFS_OPEN_MAX=32
# Enable the ring buffer library.
CONFIG_RING_BUFFER=y
# Otherwise you can't print floating point numbers!
CONFIG_FPU=y 
CONFIG_CBPRINTF_FP_SUPPORT=y
# Enable the GPIO capability.
CONFIG_GPIO=y
CONFIG_PINCTRL=y
CONFIG_UART_USE_RUNTIME_CONFIGURE=n
#CONFIG_UART_INTERRUPT_DRIVEN=y
CONFIG_UART_LINE_CTRL=y
# CONFIG_UART_ASYNC_API=y
CONFIG_THREAD_NAME=y

#--- STACK/BUFFER SIZES ---#
CONFIG_MAIN_STACK_SIZE=1536
CONFIG_IDLE_STACK_SIZE=320
CONFIG_ISR_STACK_SIZE=2048
CONFIG_HEAP_MEM_POOL_SIZE=2048
CONFIG_NET_RX_STACK_SIZE=2048
CONFIG_NET_TX_STACK_SIZE=4096
# CONFIG_NET_MGMT_EVENT_STACK_SIZE=2048
CONFIG_NET_BUF_DATA_SIZE=2048

#--- NETWORKING ---#
CONFIG_NETWORKING=y
CONFIG_NET_SOCKETS=y
CONFIG_NET_SOCKETS_OFFLOAD=y
# Enable TLS credential management.
CONFIG_TLS_CREDENTIALS=y
# Enable IPv6 support
CONFIG_NET_IPV6=y
# Enable IPv4 support
CONFIG_NET_IPV4=y
# Enable the MQTT Lib
CONFIG_MQTT_LIB=y
CONFIG_MQTT_LIB_TLS=y
CONFIG_MQTT_KEEPALIVE=1200
CONFIG_MQTT_CLEAN_SESSION=y
# Below depends on CONFIG_POSIX_MAX_FDS
CONFIG_NET_SOCKETS_POLL_MAX=32 

#--- DNS ---#
CONFIG_DNS_RESOLVER=y
# Enable additional buffers
CONFIG_DNS_RESOLVER_ADDITIONAL_BUF_CTR=5
# Enable additional queries
CONFIG_DNS_RESOLVER_ADDITIONAL_QUERIES=2
# Enable mDNS support
CONFIG_MDNS_RESOLVER=y
CONFIG_DNS_RESOLVER_MAX_SERVERS=3
CONFIG_DNS_SERVER_IP_ADDRESSES=y
CONFIG_DNS_NUM_CONCUR_QUERIES=5
# Google DNS IPv4 and IPv6 servers
CONFIG_DNS_SERVER1="8.8.8.8"
CONFIG_DNS_SERVER2="2001:4860:4860::8888"

#--- MODEM ---#
CONFIG_MODEM=y
CONFIG_MODEM_KEY_MGMT=y
CONFIG_LTE_LINK_CONTROL=y
CONFIG_NRF_MODEM_LIB=y
CONFIG_MODEM_SOCKET=y
CONFIG_MODEM_SOCKET_PACKET_COUNT=50
# Reset the modem if it encounters a fault in its firmware.
CONFIG_NRF_MODEM_LIB_ON_FAULT_RESET_MODEM=y
CONFIG_NRF_MODEM_LIB_SENDMSG_BUF_SIZE=4096
CONFIG_NRF_MODEM_LIB_SHMEM_RX_SIZE=4096
CONFIG_NRF_MODEM_LIB_SHMEM_TX_SIZE=4096
CONFIG_NRF_MODEM_LIB_HEAP_SIZE=4096

#--- BLUETOOTH ---#
CONFIG_BT=y
CONFIG_BT_DEVICE_NAME="MyDev"

CONFIG_BT_H4=y
CONFIG_BT_HCI=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_EXT_ADV=y
# CONFIG_BT_CTLR=y
# CONFIG_BT_LL_SW_SPLIT=y
# CONFIG_BT_LL_SOFTDEVICE=y
CONFIG_BT_DRIVERS=y
CONFIG_BT_WAIT_NOP=y
CONFIG_NRF_SW_LPUART=n

CONFIG_BT_PER_ADV=y
CONFIG_BT_PER_ADV_SYNC_TRANSFER_SENDER=y
CONFIG_BT_PER_ADV_RSP=y

CONFIG_BT_ATT_PREPARE_COUNT=2
CONFIG_BT_L2CAP_TX_BUF_COUNT=8
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_BUF_ACL_RX_SIZE=250

CONFIG_BT_HCI_CORE_LOG_LEVEL_DBG=y

CONFIG_NRFX_UARTE0=y
CONFIG_NRFX_UARTE1=y
CONFIG_NRFX_UARTE2=y
CONFIG_NRFX_UARTE3=y
 
#--- DATE & TIME ---#
CONFIG_DATE_TIME=y
CONFIG_DATE_TIME_MODEM=y
CONFIG_DATE_TIME_AUTO_UPDATE=y

#--- ZCBOR LIBRARY ---#
# CONFIG_ZCBOR=y
# 
######### DEBUGGING MODE ##########
# CONFIG_DEBUG=y
# CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_THREAD_INFO=y
CONFIG_RTT_CONSOLE=y
CONFIG_USE_SEGGER_RTT=y

# CONFIG_NRF_MODEM_LIB_TRACE=y
# CONFIG_NRF_MODEM_LIB_TRACE_BACKEND_UART=y

# Debugging configuration
# CONFIG_THREAD_ANALYZER=y
# CONFIG_THREAD_ANALYZER_AUTO=y
# CONFIG_THREAD_ANALYZER_AUTO_STACK_SIZE=2048
# CONFIG_THREAD_ANALYZER_RUN_UNLOCKED=y
# CONFIG_THREAD_ANALYZER_USE_PRINTK=y

# CONFIG_TRACING=y
# CONFIG_TRACING_NETWORKING=y
# CONFIG_TRACING_NET_CORE=y
# CONFIG_TRACING_NET_SOCKETS=y
# CONFIG_TRACING_OBJECT_TRACKING=y
# 
# CONFIG_SEGGER_SYSTEMVIEW=y

CONFIG_NET_LOG=y

# CONFIG_MQTT_LOG_LEVEL_DBG=y
CONFIG_LOG=y
CONFIG_LOG_BACKEND_RTT=y
CONFIG_LOG_MODE_IMMEDIATE=y
# CONFIG_LOG_MODE_DEFERRED=n 
# If you want deferred logging, set stack sizes below.
# CONFIG_LOG_BUFFER_SIZE=4096
# CONFIG_LOG_PROCESS_THREAD_STACK_SIZE=4096

# CONFIG_NRF_MODEM_LIB_MEM_DIAG=y
# CONFIG_NRF_MODEM_LIB_MEM_DIAG_ALLOC=y
# CONFIG_NRF_MODEM_LIB_MEM_DIAG_DUMP=y

# Add asserts
# CONFIG_ASSERT=y
# CONFIG_ASSERT_VERBOSE=y
# CONFIG_ASSERT_NO_COND_INFO=n
# CONFIG_ASSERT_NO_MSG_INFO=n
# CONFIG_THREAD_NAME=y
# CONFIG_STACK_SENTINEL=y

CONFIG_WATCHDOG=y

  • Hi Ethan,

    Thank you for your input. I have conveyed this to the team and will get back to you with updates.

    Regards,

    Priyanka

  • I was able to make it crash with debug logging enabled by connecting and disconnecting a few times in a row. Again, the number of times I can disconnect and reconnect (with or without debug logging) before getting an assertion fail and kernel panic varies. Sometimes it is just one time; sometimes it's five or more. Here are the relevant lines of the crash with debugging enabled.

    Let me know if I can test anything for you, or provide any more information.

    00> [00:00:31.059,143] <dbg> bt_hci_core: bt_recv_unsafe: buf 0x2001e974 len 6[0m
    00> [00:00:31.059,173] <dbg> bt_hci_core: hci_disconn_complete_prio: status 0x00  handle 15 reason 0x08
    00> [00:00:31.059,204] <dbg> bt_hci_core: bt_tx_irq_raise: kick TX
    00> [00:00:31.059,265] <dbg> bt_hci_core: rx_work_handler: Getting net_buf from queue
    00> [00:00:31.059,265] <dbg> bt_hci_core: rx_work_handler: buf 0x2001e974 type 1 len 6
    00> [00:00:31.059,295] <dbg> bt_hci_core: hci_event: event 0x05
    00> [00:00:31.059,326] <dbg> bt_hci_core: hci_disconn_complete: status 0x00  handle 15 reason 0x08
    00> [00:00:31.059,387] <dbg> bt_hci_core: tx_processor: TX process start
    00> [00:00:31.059,539] <err> tracing: Disconnected, reason 0x08 
    00> 
    00> [00:00:31.059,570] <dbg> bt_adv: bt_le_adv_resume: No valid legacy adv
    00> [00:00:31.059,631] <dbg> bt_hci_core: bt_hci_cmd_create: opcode 0x2039 param_len 6
    00> [00:00:31.059,661] <dbg> bt_hci_core: bt_hci_cmd_create: buf 0x2001f560
    00> [00:00:31.059,692] <dbg> bt_hci_core: bt_hci_cmd_send_sync: buf 0x2001f560 opcode 0x2039 len 9
    00> [00:00:31.059,692] <dbg> bt_hci_core: bt_tx_irq_raise: kick TX
    00> [00:00:31.059,722] <dbg> bt_hci_core: bt_hci_cmd_send_sync: process cmd 0x2001f560 want 0x2001f560
    00> [00:00:31.059,753] <dbg> bt_hci_core: hci_core_send_cmd: fetch cmd
    00> [00:00:31.059,753] <dbg> bt_hci_core: hci_core_send_cmd: Sending command 0x2039 (buf 0x2001f560) to driver
    00> [00:00:31.059,783] <dbg> bt_hci_core: bt_send: buf 0x2001f560 len 9 type 0
    00> ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:430
    00>   Controller unresponsive, command opcode 0x2039 timeout with err -11
    00> [00:00:51.059,936] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000019
    00> [00:00:51.059,967] <err> os: r3/a4:  0x00000003 r12/ip:  0x00000010 r14/lr:  0x0000f82b[0m
    00> [00:00:51.059,967] <err> os:  xpsr:  0x21000000
    00> [00:00:51.059,997] <err> os: s[ 0]:  0x2001dd98  s[ 1]:  0x2001dd98  s[ 2]:  0x00000000  s[ 3]:  0x00000001
    00> [00:00:51.059,997] <err> os: s[ 4]:  0x2001dda8  s[ 5]:  0x2001dda8  s[ 6]:  0x2000f4b0  s[ 7]:  0x2001de34
    00> [00:00:51.060,028] <err> os: s[ 8]:  0x2001f56c  s[ 9]:  0x2001f560  s[10]:  0x00000001  s[11]:  0x2000f4a0
    00> [00:00:51.060,028] <err> os: s[12]:  0xffffffff  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x0002a391
    00> [00:00:51.060,058] <err> os: fpscr:  0x00000001
    00> [00:00:51.060,058] <err> os: Faulting instruction address (r15/pc): 0x0000f83a
    00> [00:00:51.060,089] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    00> [00:00:51.060,119] <err> os: Current thread: 0x20010638 (sysworkq)
    00> [00:00:54.587,921] <err> os: Halting system

  • Hi Ethan,

    I am checking this internally to see if this is a known issue and will get back to you.

    -Priyanka

  • Hi Ethan,

    Apologies for the delay as it took some time to spot the issue.

    Starting with NCS 3.0.0 this sample is deprecated.

    Could the same be seen with the normal HCI UART sample? https://docs.nordicsemi.com/bundle/ncs-latest/page/zephyr/samples/bluetooth/hci_uart/README.html
    (or HCI UART acyns https://docs.nordicsemi.com/bundle/ncs-latest/page/zephyr/samples/bluetooth/hci_uart_async/README.html)

    -Priyanka

Related