6431.central_uart program from Vidar Berg with updates not working reliably in v3.1.1

We are using Vidar's program that is the basic samples/bluetooth/central_uart sample but with 3 board files so the program can be run on the nrf9160dk/nrf9160/ns board, in conjunction with bluetooth/samples/nci_lpuart running on the nRF52840 on the nRF9160DK board.

Depending upon when I reset the nrf9160, I frequently see the following, where I have added a few printk's to the code. I will show here the successful run followed by a failure. Please ignore the ECC key errors--they can be fixed with two extra config options. To make the error happen frequently, in pracitice I am push button resetting the nRF9160 soon after
Starting Bluetooth Central UART sample
Each time I see the "draining bytes from the uart" printk's that I added to bt_hci_transport_setup(), that's when the program times out and crashes.

*** Booting nRF Connect SDK v3.1.1-e2a97fe2578a ***
*** Using Zephyr OS v4.1.99-ff8f0c579eeb ***
[00:00:00.382,415] <err> bt_hci_core: HCI driver is not ready
[00:00:00.382,446] <err> central_uart: Bluetooth init failed (err -19)
*** Booting nRF Connect SDK v3.1.1-e2a97fe2578a ***
*** Using Zephyr OS v4.1.99-ff8f0c579eeb ***
[00:00:00.285,797] <err> bt_hci_core: HCI driver is not ready
[00:00:00.285,827] <err> central_uart: Bluetooth init failed (err -19)
*** Booting nRF Connect SDK v3.1.1-e2a97fe2578a ***
*** Using Zephyr OS v4.1.99-ff8f0c579eeb ***
[00:00:00.286,865] <err> bt_hci_core: HCI driver is not ready
[00:00:00.286,895] <err> central_uart: Bluetooth init failed (err -19)
*** Booting nRF Connect SDK v3.1.1-e2a97fe2578a ***
*** Using Zephyr OS v4.1.99-ff8f0c579eeb ***
[00:00:00.479,858] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.479,858] <inf> fs_nvs: alloc wra: 0, fe8
[00:00:00.479,888] <inf> fs_nvs: data wra: 0, 0
h4_open() entered...
bt_hci_transport_setup() entered...
bt_hci_transport_setup() about to gpio_pin_set(port, RESET_GPIO_PIN=24, 0)
bt_hci_transport_setup() exit success
h4_open() exit
bt_init() entered...
hci_init() entered...
common_init() entered...
common_init() about to  BT_HCI_OP_RESET
[00:00:01.192,993] <err> bt_driver: Unknown H:4 type 0x00
[00:00:01.193,084] <wrn> lpuart: RX: End detected at unexpected state (1).
common_init() after BT_HCI_OP_RESET
common_init() after drv_quirk_no_reset()
common_init() after read_local_ver_complete()
common_init() exit, good
hci_init() just finished common_init()
hci_init() just finished le_init()
[00:00:01.225,341] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:01.225,372] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:01.225,433] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 252.16862 Build 1121034987
hci_init() exit, no error
bt_init() line:4260
bt_init() line:4268
bt_init() line:4276
[00:00:01.227,966] <inf> bt_hci_core: No ID address. App must call settings_load()
[00:00:01.227,996] <inf> central_uart: Bluetooth initialized
[00:00:01.228,363] <err> bt_ecc: Failed to generate ECC key -134
[00:00:01.228,363] <wrn> bt_smp: Public key not available
[00:00:01.231,567] <inf> bt_hci_core: HCI transport: H:4
[00:00:01.231,658] <inf> bt_hci_core: Identity: F1:B2:BA:28:C5:6D (random)
[00:00:01.231,689] <inf> bt_hci_core: HCI: version 6.1 (0x0f) revision 0x1069, manufacturer 0x0059
[00:00:01.231,719] <inf> bt_hci_core: LMP: version 6.1 (0x0f) subver 0x1069
[00:00:01.231,842] <err> bt_gatt: Unable to import the key for AES CMAC -134
[00:00:01.231,964] <inf> central_uart: NUS Client module initialized
[00:00:01.231,994] <inf> central_uart: Scan module initialized
[00:00:01.237,304] <inf> central_uart: Scan started
Starting Bluetooth Central UART sample
[00:00:01.238,098] <err> bt_gatt: Unable to import the key for AES CMAC -134
*** Booting nRF Connect SDK v3.1.1-e2a97fe2578a ***
*** Using Zephyr OS v4.1.99-ff8f0c579eeb ***
[00:00:00.287,353] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.287,353] <inf> fs_nvs: alloc wra: 0, fd0
[00:00:00.287,384] <inf> fs_nvs: data wra: 0, 1c
h4_open() entered...
bt_hci_transport_setup() entered...
bt_hci_transport_setup(): draining bytes from uart, c=04
bt_hci_transport_setup(): draining bytes from uart, c=3E
bt_hci_transport_setup(): draining bytes from uart, c=29
bt_hci_transport_setup(): draining bytes from uart, c=02
bt_hci_transport_setup(): draining bytes from uart, c=01
bt_hci_transport_setup(): draining bytes from uart, c=00
bt_hci_transport_setup(): draining bytes from uart, c=00
bt_hci_transport_setup(): draining bytes from uart, c=A4
bt_hci_transport_setup(): draining bytes from uart, c=FA
bt_hci_transport_setup(): draining bytes from uart, c=2A
bt_hci_transport_setup(): draining bytes from uart, c=6A
bt_hci_transport_setup(): draining bytes from uart, c=B4
bt_hci_transport_setup(): draining bytes from uart, c=FA
bt_hci_transport_setup(): draining bytes from uart, c=1D
bt_hci_transport_setup(): draining bytes from uart, c=02
bt_hci_transport_setup(): draining bytes from uart, c=01
bt_hci_transport_setup(): draining bytes from uart, c=06
bt_hci_transport_setup(): draining bytes from uart, c=19
bt_hci_transport_setup(): draining bytes from uart, c=16
bt_hci_transport_setup(): draining bytes from uart, c=F7
bt_hci_transport_setup(): draining bytes from uart, c=FD
bt_hci_transport_setup(): draining bytes from uart, c=01
bt_hci_transport_setup(): draining bytes from uart, c=DF
bt_hci_transport_setup(): draining bytes from uart, c=FF
bt_hci_transport_setup(): draining bytes from uart, c=7D
bt_hci_transport_setup(): draining bytes from uart, c=84
bt_hci_transport_setup(): draining bytes from uart, c=DB
bt_hci_transport_setup(): draining bytes from uart, c=25
bt_hci_transport_setup(): draining bytes from uart, c=54
bt_hci_transport_setup(): draining bytes from uart, c=F4
bt_hci_transport_setup(): draining bytes from uart, c=1B
bt_hci_transport_setup(): draining bytes from uart, c=D0
bt_hci_transport_setup(): draining bytes from uart, c=77
bt_hci_transport_setup(): draining bytes from uart, c=2D
bt_hci_transport_setup(): draining bytes from uart, c=0F
bt_hci_transport_setup(): draining bytes from uart, c=45
bt_hci_transport_setup(): draining bytes from uart, c=C3
bt_hci_transport_setup(): draining bytes from uart, c=65
bt_hci_transport_setup(): draining bytes from uart, c=00
bt_hci_transport_setup(): draining bytes from uart, c=00
bt_hci_transport_setup(): draining bytes from uart, c=00
bt_hci_transport_setup(): draining bytes from uart, c=00
bt_hci_transport_setup(): draining bytes from uart, c=03
bt_hci_transport_setup(): draining bytes from uart, c=A6
bt_hci_transport_setup() about to gpio_pin_set(port, RESET_GPIO_PIN=24, 0)
bt_hci_transport_setup() exit success
h4_open() exit
bt_init() entered...
hci_init() entered...
common_init() entered...
common_init() about to  BT_HCI_OP_RESET
ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:505
        Controller unresponsive, command opcode 0x0c03 timeout with err -11
[00:00:10.300,109] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000002
[00:00:10.300,140] <err> os: r3/a4:  0x00000003 r12/ip:  0x200124dc r14/lr:  0x000104c5
[00:00:10.300,140] <err> os:  xpsr:  0x01000000
[00:00:10.300,170] <err> os: Faulting instruction address (r15/pc): 0x000104d4
[00:00:10.300,201] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:10.300,231] <err> os: Current thread: 0x2000e7e0 (unknown)
[00:00:10.358,123] <err> os: Halting system

The central_uart code and prj.conf have no changes I am aware of other than the CONFIG_LOG... options added for BACKEND_UART.

The board files are modified from the original Vidar code like follows

(v3.1.1) burt@P51:~/ncs/v3.1.1/nrf/samples/bluetooth/central_uart$  diff -rw ~/6431.central_uart/boards boards
Only in boards: nrf5340dk_nrf5340_cpuapp.conf
Only in boards: nrf54h20dk_nrf54h20_cpuapp.conf
Only in boards: nrf54h20dk_nrf54h20_cpuapp.overlay
Only in boards: nrf54l15dk_nrf54l05_cpuapp.conf
Only in boards: nrf54l15dk_nrf54l05_cpuapp.overlay
Only in boards: nrf54l15dk_nrf54l10_cpuapp.conf
Only in boards: nrf54l15dk_nrf54l10_cpuapp.overlay
Only in boards: nrf54l15dk_nrf54l15_cpuapp.conf
Only in boards: nrf54l15dk_nrf54l15_cpuapp.overlay
Only in boards: nrf54lm20dk_nrf54lm20a_cpuapp.conf
Only in boards: nrf54lm20dk_nrf54lm20a_cpuapp.overlay
Only in boards: nrf54lm20pdk_nrf54lm20a_cpuapp.conf
Only in boards: nrf54lm20pdk_nrf54lm20a_cpuapp.overlay
Only in boards: nrf54lv10dk_nrf54lv10a_cpuapp.conf
Only in boards: nrf54lv10dk_nrf54lv10a_cpuapp.overlay
diff -rw /home/burt/6431.central_uart/boards/nrf9160dk_nrf9160_ns_0_14_0.overlay boards/nrf9160dk_nrf9160_ns_0_14_0.overlay
2c2
< #include <nrf9160dk_nrf52840_reset_on_if9.dtsi>
---
> #include <nrf9160/nrf9160dk_nrf52840_reset_on_if9.dtsi>
diff -rw /home/burt/6431.central_uart/boards/nrf9160dk_nrf9160_ns.conf boards/nrf9160dk_nrf9160_ns.conf
10a11,13

> #BMS CONFIG_TFM_PROFILE_TYPE_MINIMAL=n
> #BMS CONFIG_TFM_PROFILE_TYPE_NOT_SET=y
diff -rw /home/burt/6431.central_uart/boards/nrf9160dk_nrf9160_ns.overlay boards/nrf9160dk_nrf9160_ns.overlay
1c1
< #include <nrf9160dk_nrf52840_reset_on_if5.dtsi>
---
> #include <nrf9160/nrf9160dk_nrf52840_reset_on_if5.dtsi>
6a7,8
>         zephyr,bt-hci = &bt_hci_uart;

26a29,34
>               //following 4 lines are added from lte_ble_gateway
>                 bt_hci_uart: bt_hci_uart {
>                         compatible = "zephyr,bt-hci-uart";
>                         status = "okay";
>                 };

Thank you for help with this. Note that I am testing with my external custom nRF52832 board off. That board runs bluetooth/samples/peripheral_uart normally.

Burt Silverman

Parents
  • Hi Burt,

    Is it possible to get logging from the nRF52832 hci controller as well, and if not, are you able to probe the PINRESET input on it do see if it's driven low by the 9160 on startup? Also, I don't remember when I created this sample, do you have a link to that case? And did this work reliably in a previous SDK version? 

    Thanks,

    Vidar

  • Hi Vidar, thank you for getting involved. The original ticket where you provided 6431.central_uart to us is (+) BLE communication between a nRF52 DK board and a nRF9160 DK board - Nordic Q&A - Nordic DevZone - Nordic DevZone. I need to try the code on some older versions of SDK such as v2.5.0; I believe that is where we started. I was not involved until very recently, when the project was moved from back burner to front burner. I know that the nRF52840 on the nRF9160DK is getting reset because I can go to its console and see boot messages after I reset the nRF9160. I will try to get logging from the nRF52840 controller. Will get back to you on that.

    Burt

  • I am pretty sure it is not me doing something stupid, but the code: v3.0.0 and v3.0.2 are working fine, but v3.1.0 and v3.1.1 both screw up.

    Burt

  • Am I close--if git bisect has not failed me then the problem should be

    (v3.1.0) burt@P51:~/ncs/testing/nrf$ git bisect good
    e7a49bd24f3ecea5b63ef43d097a55b3bc96b4c9 is the first bad commit
    commit e7a49bd24f3ecea5b63ef43d097a55b3bc96b4c9
    Author: Krzysztof ChruĹ›ciĹ„ski <[email protected]>
    Date:   Fri Mar 22 10:16:15 2024 +0100

        drivers: uart_nrf_sw_lpuart: Fix interrupt driven case
        
        Driver was calling isr function from unknown context when
        uart_irq_[rx,tx]_enable was called. It could lead to failures
        because driver assumes that isr function won't be interrupt
        by the same isr function.
        
        A solution proposed here is to use k_timer handler context
        which is called from RTC1 interrupt context. Assert is
        ensuring that RTC1 and UART interrupt priorities are the same.
        Timer is called from the shortest possible timeout and from
        that context isr function is called.
        
        Signed-off-by: Krzysztof ChruĹ›ciĹ„ski <[email protected]>
        (cherry picked from commit 22790df41c3d91ad47874a1bf8a32a77c4a92b8d)

     drivers/serial/uart_nrf_sw_lpuart.c | 50 ++++++++++++++++++++++++++++---------
     1 file changed, 38 insertions(+), 12 deletions(-)
    (v3.1.0) burt@P51:~/ncs/testing/nrf$ 

    First time I have ever used git bisect. Each time I updated with git bisect <good | bad> I then did a west update. Then did pristine builds for the two processors and flashed them.

  • Sorry for the delay. Thanks for posting the updates. It does seem like you may have found the culprit. However, this commit is quite old an was merged before v3.0.0. Have you tried reverting the changes from this commit in v3.1.1 to see if that alone resolves the issue?

    Thanks,

    Vidar

  • Hi Vidar, I did try reverting the changes from the commit in v3.1.1 and found that it resolved the issue. Of course, it is possible that doing that would open up the possibility of other issues arising--whatever Krzysztof ChruĹ›ciĹ„ski was trying to solve (not that I have not attempted to learn the details of nrf_sw_lpuart).

    I see that the date of the commit is old, but I do not believe that this change was merged at that time. In fact, if I use gitk to look at the nrf repository at 3.1.x (or download the latest from GitHub) I can look at the commit and easily see that it follows v3.1.0-rc1 and precedes v3.1.0-rc2.

    Hence, there is no inconsistency that I have only seen the problem in v3.1.0 and v3.1.1 (I have not downloaded release candidates to my build laptop). No inconsistency that the problem disappeared when reverting the change.

    Burt

  • Hi Vidar,

    Just to add to the complexity...

    What if the nrf_sw_lpuart change was good, but the central_uart sample required changes to make it compatible...

    I built samples/cellular/lte_ble_gateway, at v3.1.1, expecting to see the same issue as in central_uart. But it is different. It does not get stuck and timeout. On the other hand, it does start up with a bunch of warnings about req_pin low when expected high on the lpuart, and Empty receiver state:4 on the lpuart. These are when some lower-level code is doing a BT_HCI_OP_RESET. Then everything is fine. So, it just has ugly warnings and errors.

    I hope you can go through both samples, along with the lpuart, and come up with something that works for both samples and ideally doesn't show errors and warnings if things are working properly.

    Thanks, Vidar.

    Burt

Reply
  • Hi Vidar,

    Just to add to the complexity...

    What if the nrf_sw_lpuart change was good, but the central_uart sample required changes to make it compatible...

    I built samples/cellular/lte_ble_gateway, at v3.1.1, expecting to see the same issue as in central_uart. But it is different. It does not get stuck and timeout. On the other hand, it does start up with a bunch of warnings about req_pin low when expected high on the lpuart, and Empty receiver state:4 on the lpuart. These are when some lower-level code is doing a BT_HCI_OP_RESET. Then everything is fine. So, it just has ugly warnings and errors.

    I hope you can go through both samples, along with the lpuart, and come up with something that works for both samples and ideally doesn't show errors and warnings if things are working properly.

    Thanks, Vidar.

    Burt

Children
Related