Bluetooth HCI lpuart issue - nRF9160 + nRF52840

Hello,

I'm using nRF9160 as a HCI host and nRF52840 as HCI controller (nRF52840 config is based on HCI lpuart example). My application receives data over Bluetooth using notifications. To handle the notifications data I use a message queue and a workqueue (to avoid handling it from a notification callback which is executed from a bt_rx_thread). It has to be able to receive ~50kB of data. Sometimes it works without any problem but sometimes I get the following information in logs:

bt_hci_core: opcode 0x0002 pool id 3 pool 0x20020958 != &hci_cmd_pool 0x200209b0
lpuart: Tx timeout

or

lpuart: req pin low when expected high
lpuart: Empty receiver state:4
bt_hci_core: opcode 0x0000 pool id 3 pool 0x20020958 != &hci_cmd_pool 0x200209b0
bt_hci_core: Unexpected HOST_NUM_COMPLETED_PACKETS (status 0x0c)
bt_hci_core: Unexpected HOST_NUM_COMPLETED_PACKETS (status 0x0c)

or

lpuart: req pin low when expected high
lpuart: Empty receiver state:4
bt_driver: Not enough space in buffer 1275/255

After each of above logs the bluetooth becomes unresponsive. Using sniffer I found out that the connection is alive and it sends Empty PDU packets. Only reseting the device helps in such situation.

The higher baudrate of HCI, the issue occurs more frequently. When I try to use baud rate of 1000000, it fails almost everytime.

Setup:
Custom board with nRF9160 and nRF52840
NCS v2.6.1

nRF9160 BT configuration:
CONFIG_BT=y
CONFIG_BT_H4=y
CONFIG_BT_WAIT_NOP=y
CONFIG_BT_HCI=y
CONFIG_BT_CTLR=n
CONFIG_BT_CENTRAL=y
CONFIG_BT_GATT_CLIENT=y
CONFIG_BT_GATT_DM=y
CONFIG_BT_SCAN=y
CONFIG_BT_SCAN_FILTER_ENABLE=y
CONFIG_BT_SCAN_MANUFACTURER_DATA_CNT=1
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_BUF_ACL_RX_SIZE=251
CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_HCI_TX_STACK_SIZE_WITH_PROMPT=y
CONFIG_BT_HCI_TX_STACK_SIZE=2048
CONFIG_BT_RX_STACK_SIZE=4096
CONFIG_MAIN_STACK_SIZE=2048
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048
CONFIG_HEAP_MEM_POOL_SIZE=8192

nRF52840 BT configuration:
CONFIG_BT=y
CONFIG_BT_CTLR=y
CONFIG_BT_WAIT_NOP=y
CONFIG_BT_HCI_RAW=y
CONFIG_BT_HCI_RAW_H4=y
CONFIG_BT_HCI_RAW_H4_ENABLE=y
CONFIG_BT_BUF_ACL_RX_SIZE=251
CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_BUF_EVT_DISCARDABLE_SIZE=255

Also, I would like to mention another issue I encountered.
If I detect some error (I don't mean the error with HCI mentioned above but some kind of error in my functions that has to be handled) while downloading ~50kB of data using notifications,
I would like to drop connection. To do it, I execute bt_conn_disconnect in a workqueue that handles notification data. Using sniffer, I can see that the disconnect command goes to the peripheral device and the connection ends. The problem is that I do not receive a disconnected callback. 
When I disconnect after receiving all the data, the disconnected callback is executed.
Any help would be greatly appreciated.
Parents
  • Hello,

    I will try to see if I can replicate the HCI communication issues on my end, but did not get time to get started on that today. Will update the ticket when I have had a chance to try this.

    To do it, I execute bt_conn_disconnect in a workqueue that handles notification data. Using sniffer, I can see that the disconnect command goes to the peripheral device and the connection ends. The problem is that I do not receive a disconnected callback. When I disconnect after receiving all the data, the disconnected callback is executed.
    Any help would be greatly appreciated.

    Is the application allowed to returned to the idle thread after you've sent the disconnect, or is it busy with other tasks? It unepexted if the disconnect callback never comes through. Are you observing any HCI errors when this happens?

    Best regards,

    Vidar

  • Is the application allowed to returned to the idle thread after you've sent the disconnect, or is it busy with other tasks? It unepexted if the disconnect callback never comes through. Are you observing any HCI errors when this happens?

    I put a log in an idle thread every second and I can see that after executing bt_conn_disconnect, idle thread still works. My other threads also work. No HCI errors observed in logs. The bt_conn_disconnect also doesn't return any error.

  • Could you show me what should be modified in .dts, .conf and _defconfig files in both nRF91 and nRF52 applications to make it use hardware flow control?

    For nRF52 I followed HCI uart sample but I don't see any sample that shows how to configure nRF91 to work with HCI uart.

  • Sure. For the nrf9160 target, you can use these overlays:

    #include <nrf9160dk_nrf52840_reset_on_if5.dtsi>
    #include <nrf9160dk_uart1_on_if0_3.dtsi>
    
    / {
    	chosen {
    		zephyr,bt-uart=&uart1;
    	};
    };
    
    &uart1 {
    	current-speed = <1000000>;
    	hw-flow-control;
    };
    

    nrf9160dk_nrf9160_ns_0_14_0.overlay

  • I assume that this modification is also needed:

    CONFIG_UART_2_ASYNC=y
    CONFIG_UART_2_INTERRUPT_DRIVEN=n
    CONFIG_UART_2_NRF_HW_ASYNC=y
    CONFIG_UART_2_NRF_HW_ASYNC_TIMER=2

    into

    CONFIG_UART_1_ASYNC=y
    CONFIG_UART_1_INTERRUPT_DRIVEN=n
    CONFIG_UART_1_NRF_HW_ASYNC=y
    CONFIG_UART_1_NRF_HW_ASYNC_TIMER=2

    After above modification I still can't get bluetooth working with hardware flow control. Can you confirm that you can get bluetooth working on nRF9160dk using HCI_UART for nRF52 and LTE_BLE_GATEWAY for nRF91?

  • You need to use the interrupt-driven API as that is what is implemented in the HCI host driver.

  • Thank you, now the hardware flow control works with HCI.

    Would it be possible to use lpuart with hardware flow control to lower the idle current consumption? 
    If I wired 6 pins between nRF91 and nRF52 (tx, rx, rts, cts, req, rdy) - would it make sense?

Reply Children
  • I can't immediately think of any reasons why this would not work if you can afford to use the extra pins. I can look into it, but perhaps you should try stress testing your application again to verify if enabling HWFC fixed the reliability issues you had first.

  • I would be grateful if you could look into 6 pins solution.

    I'm testing my application with HWFC and 1000000 baudrate and I didn't see any issues from the origin post. 

    However, now I'm facing another issue which didn't occur before. During the connection I use bt_gatt_write_without_response function to send commands over bluetooth (10 bytes max). I execute it from my custom workqueue (based on the function description - if used from outside the BT RX thread, it acts like a blocking function) and randomly this function might block forever - it never returns. I don't see any related information in logs. This issue doesn't seem to occur when I change optimization from CONFIG_DEBUG_OPTIMIZATIONS to CONFIG_SIZE_OPTIMIZATIONS but I wouldn't take it as a solution. Do you think of any reason why would that happen? 

  • After some debugging I can say that bt_gatt_write_without_response function blocks forever because it waits for being able to allocate network buffer (in file buf.c, function net_buf_alloc_len). The timeout for this operation is set to K_FOREVER in function bt_att_chan_create_pdu in file att.c. I tried to change the timeout to 1 second and disconnect if timeout happens. When the issue occured I disconnected and on each of the following connections, the network buffer could never be allocated again. Do you have any suggestions why would the network buffer become unavailable to be allocated forever?

    Edit 1
    I turned on CONFIG_NET_BUF_LOG and now if the issue occurs with timeout set to K_FOREVER I can see in logs:
    <wrn> net_buf: bt_l2cap_create_pdu_timeout():645: Pool 0x2002081c low on buffers.
    <wrn> net_buf: bt_l2cap_create_pdu_timeout():645: Pool 0x2002081c blocked for 1 secs
    <wrn> net_buf: bt_l2cap_create_pdu_timeout():645: Pool 0x2002081c low on buffers.
    <wrn> net_buf: bt_l2cap_create_pdu_timeout():645: Pool 0x2002081c blocked for 2 secs

    and this shows up every second.

  • Hi,

    It seems like the HWFC control will interfere with the REQ/RDY handling in the LP UART implementation. However, I've discussed the case with one of the developers, and I thought your initial issue may have been caused by the bug addressed by this PR: https://github.com/nrfconnect/sdk-nrf/pull/14609. Could you please try to cherry-pick this commit and increase the receive buffer (https://github.com/nrfconnect/sdk-nrf/blob/ef120afe4d0d3e8a40c664f7f4c40881c456017b/drivers/serial/Kconfig.nrf_sw_lpuart#L41) to 512, then try again with your original project? 

    Best regards,

    Vidar

Related