Bluetooth HCI lpuart issue - nRF9160 + nRF52840

Hello,

I'm using nRF9160 as a HCI host and nRF52840 as HCI controller (nRF52840 config is based on HCI lpuart example). My application receives data over Bluetooth using notifications. To handle the notifications data I use a message queue and a workqueue (to avoid handling it from a notification callback which is executed from a bt_rx_thread). It has to be able to receive ~50kB of data. Sometimes it works without any problem but sometimes I get the following information in logs:

bt_hci_core: opcode 0x0002 pool id 3 pool 0x20020958 != &hci_cmd_pool 0x200209b0
lpuart: Tx timeout

or

lpuart: req pin low when expected high
lpuart: Empty receiver state:4
bt_hci_core: opcode 0x0000 pool id 3 pool 0x20020958 != &hci_cmd_pool 0x200209b0
bt_hci_core: Unexpected HOST_NUM_COMPLETED_PACKETS (status 0x0c)
bt_hci_core: Unexpected HOST_NUM_COMPLETED_PACKETS (status 0x0c)

or

lpuart: req pin low when expected high
lpuart: Empty receiver state:4
bt_driver: Not enough space in buffer 1275/255

After each of above logs the bluetooth becomes unresponsive. Using sniffer I found out that the connection is alive and it sends Empty PDU packets. Only reseting the device helps in such situation.

The higher baudrate of HCI, the issue occurs more frequently. When I try to use baud rate of 1000000, it fails almost everytime.

Setup:
Custom board with nRF9160 and nRF52840
NCS v2.6.1

nRF9160 BT configuration:
CONFIG_BT=y
CONFIG_BT_H4=y
CONFIG_BT_WAIT_NOP=y
CONFIG_BT_HCI=y
CONFIG_BT_CTLR=n
CONFIG_BT_CENTRAL=y
CONFIG_BT_GATT_CLIENT=y
CONFIG_BT_GATT_DM=y
CONFIG_BT_SCAN=y
CONFIG_BT_SCAN_FILTER_ENABLE=y
CONFIG_BT_SCAN_MANUFACTURER_DATA_CNT=1
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_BUF_ACL_RX_SIZE=251
CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_HCI_TX_STACK_SIZE_WITH_PROMPT=y
CONFIG_BT_HCI_TX_STACK_SIZE=2048
CONFIG_BT_RX_STACK_SIZE=4096
CONFIG_MAIN_STACK_SIZE=2048
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048
CONFIG_HEAP_MEM_POOL_SIZE=8192

nRF52840 BT configuration:
CONFIG_BT=y
CONFIG_BT_CTLR=y
CONFIG_BT_WAIT_NOP=y
CONFIG_BT_HCI_RAW=y
CONFIG_BT_HCI_RAW_H4=y
CONFIG_BT_HCI_RAW_H4_ENABLE=y
CONFIG_BT_BUF_ACL_RX_SIZE=251
CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_BUF_EVT_DISCARDABLE_SIZE=255

Also, I would like to mention another issue I encountered.
If I detect some error (I don't mean the error with HCI mentioned above but some kind of error in my functions that has to be handled) while downloading ~50kB of data using notifications,
I would like to drop connection. To do it, I execute bt_conn_disconnect in a workqueue that handles notification data. Using sniffer, I can see that the disconnect command goes to the peripheral device and the connection ends. The problem is that I do not receive a disconnected callback. 
When I disconnect after receiving all the data, the disconnected callback is executed.
Any help would be greatly appreciated.
Parents
  • Hello,

    I will try to see if I can replicate the HCI communication issues on my end, but did not get time to get started on that today. Will update the ticket when I have had a chance to try this.

    To do it, I execute bt_conn_disconnect in a workqueue that handles notification data. Using sniffer, I can see that the disconnect command goes to the peripheral device and the connection ends. The problem is that I do not receive a disconnected callback. When I disconnect after receiving all the data, the disconnected callback is executed.
    Any help would be greatly appreciated.

    Is the application allowed to returned to the idle thread after you've sent the disconnect, or is it busy with other tasks? It unepexted if the disconnect callback never comes through. Are you observing any HCI errors when this happens?

    Best regards,

    Vidar

  • Is the application allowed to returned to the idle thread after you've sent the disconnect, or is it busy with other tasks? It unepexted if the disconnect callback never comes through. Are you observing any HCI errors when this happens?

    I put a log in an idle thread every second and I can see that after executing bt_conn_disconnect, idle thread still works. My other threads also work. No HCI errors observed in logs. The bt_conn_disconnect also doesn't return any error.

  • I can't immediately think of any reasons why this would not work if you can afford to use the extra pins. I can look into it, but perhaps you should try stress testing your application again to verify if enabling HWFC fixed the reliability issues you had first.

  • I would be grateful if you could look into 6 pins solution.

    I'm testing my application with HWFC and 1000000 baudrate and I didn't see any issues from the origin post. 

    However, now I'm facing another issue which didn't occur before. During the connection I use bt_gatt_write_without_response function to send commands over bluetooth (10 bytes max). I execute it from my custom workqueue (based on the function description - if used from outside the BT RX thread, it acts like a blocking function) and randomly this function might block forever - it never returns. I don't see any related information in logs. This issue doesn't seem to occur when I change optimization from CONFIG_DEBUG_OPTIMIZATIONS to CONFIG_SIZE_OPTIMIZATIONS but I wouldn't take it as a solution. Do you think of any reason why would that happen? 

  • After some debugging I can say that bt_gatt_write_without_response function blocks forever because it waits for being able to allocate network buffer (in file buf.c, function net_buf_alloc_len). The timeout for this operation is set to K_FOREVER in function bt_att_chan_create_pdu in file att.c. I tried to change the timeout to 1 second and disconnect if timeout happens. When the issue occured I disconnected and on each of the following connections, the network buffer could never be allocated again. Do you have any suggestions why would the network buffer become unavailable to be allocated forever?

    Edit 1
    I turned on CONFIG_NET_BUF_LOG and now if the issue occurs with timeout set to K_FOREVER I can see in logs:
    <wrn> net_buf: bt_l2cap_create_pdu_timeout():645: Pool 0x2002081c low on buffers.
    <wrn> net_buf: bt_l2cap_create_pdu_timeout():645: Pool 0x2002081c blocked for 1 secs
    <wrn> net_buf: bt_l2cap_create_pdu_timeout():645: Pool 0x2002081c low on buffers.
    <wrn> net_buf: bt_l2cap_create_pdu_timeout():645: Pool 0x2002081c blocked for 2 secs

    and this shows up every second.

  • Hi,

    It seems like the HWFC control will interfere with the REQ/RDY handling in the LP UART implementation. However, I've discussed the case with one of the developers, and I thought your initial issue may have been caused by the bug addressed by this PR: https://github.com/nrfconnect/sdk-nrf/pull/14609. Could you please try to cherry-pick this commit and increase the receive buffer (https://github.com/nrfconnect/sdk-nrf/blob/ef120afe4d0d3e8a40c664f7f4c40881c456017b/drivers/serial/Kconfig.nrf_sw_lpuart#L41) to 512, then try again with your original project? 

    Best regards,

    Vidar

  • I changed back HCI to use lpuart but with baudrate 1000000 and I applied both changes you linked but it didn't help. These logs show up while receiving the data with notifications:

    <wrn> lpuart: req pin low when expected high
    <err> lpuart: Empty receiver state:4
    <err> bt_driver: Unknown H:4 type 0x00
    <err> bt_driver: Unknown H:4 type 0x00
    <wrn> bt_hci_core: Unexpected HOST_NUM_COMPLETED_PACKETS (status 0x0c)

    Bluetooth stops working after that.

    In addition, after adding modification to uart_nrf_sw_lpuart.c, application after power up goes to zephyr fatal error and logs:
    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:331
    command opcode 0x0c03 timeout with err -11

    After restart, bluetooth starts without problem.

Reply
  • I changed back HCI to use lpuart but with baudrate 1000000 and I applied both changes you linked but it didn't help. These logs show up while receiving the data with notifications:

    <wrn> lpuart: req pin low when expected high
    <err> lpuart: Empty receiver state:4
    <err> bt_driver: Unknown H:4 type 0x00
    <err> bt_driver: Unknown H:4 type 0x00
    <wrn> bt_hci_core: Unexpected HOST_NUM_COMPLETED_PACKETS (status 0x0c)

    Bluetooth stops working after that.

    In addition, after adding modification to uart_nrf_sw_lpuart.c, application after power up goes to zephyr fatal error and logs:
    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:331
    command opcode 0x0c03 timeout with err -11

    After restart, bluetooth starts without problem.

Children
No Data
Related