This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

HCI UART Issues on nRF Connect SDK 1.3.0

Hello Nordic Team,

I am having issues with the HCI UART firmware for the nRF52840 on the Thingy91. I have been using the HCI UART firmware from nRF Connect SDK 1.0.0 for some time but recently I ported over the changes from nRF Connect SDK 1.3.0 and I have since ran into issues.

My application is similar to the LTE BLE Gateway application in that I am using the nRF52840 as a bluetooth controller to scan for nearby bluetooth devices. I have also used the patches provided by on the following ticket: https://devzone.nordicsemi.com/f/nordic-q-a/52689/nrf9160-lte-sensor-gateway-on-thingy-91/230300#230300 to support this on the Thingy91.

With the nRF Connect SDK v1.0.0 HCI UART firmware, I would occasionally see the following from the RTT logging on the nRF9160:

00> [00:16:23.539,947] <wrn> bt_driver: Discarding event 0x3e

but now with the nRF Connect SDK v1.3.0 HCI UART firmware, I get a bunch of these types of messages printed out (in addition to the message above):

00> [00:17:13.480,102] <err> bt_driver: Not enough space in buffer
00> [00:17:14.016,693] <err> bt_driver: Unknown H:4 type 0x49
00> [00:17:14.017,028] <err> bt_driver: Unknown H:4 type 0x0e
00> [00:17:14.017,364] <err> bt_driver: Unknown H:4 type 0x2f
00> [00:17:14.017,669] <err> bt_driver: Unknown H:4 type 0x6e
00> [00:17:14.018,005] <err> bt_driver: Unknown H:4 type 0xec
00> [00:17:14.018,341] <err> bt_driver: Unknown H:4 type 0x1e
00> [00:16:29.435,485] <wrn> bt_hci_core: Unhandled event 0x01 len 41: 0301da972fdf98fa1e02010403039afe16169afe1181820179ececd817b280e2e31edd480ac101bc04

The behavior of the firmware is also affected -  usually after the "Unhandled event" message as seen above, the bluetooth scan will be provide no results for 30-60 seconds before it recovers. This differs from the nRF Connect SDK v1.0.0 HCI UART firmware, as the bluetooth scan would still provide results regardless of the "Discarded event" message.

Any ideas?

Thank you,
Cody

Parents
  • Hi,

     

    It looks like the queue size requirements has changed for your specific scenario between releases.

    According to this zephyr-issue, the fix is to increase "CONFIG_BT_HCI_TX_STACK_SIZE". What is this currently set to at your end, and could you try to increase this and see if the issue disappears?

     

    Kind regards,

    Håkon

  • Hello ,

    The configuration option "CONFIG_BT_HCI_TX_STACK_SIZE" is not directly settable by the user. Based on the configuration in my project, the nRF52840 HCI UART firmware selects a CONFIG_BT_HCI_TX_STACK_SIZE of size 640 and the nRF9160 application firmware selects a CONFIG_BT_HCI_TX_STACK_SIZE of size 512.

    I was able to manually edit the Kconfig file at zephyr/subsys/bluetooth/host/Kconfig and modify this value for testing purposes to 2048 on both the nRF52840 and nRF9160 side but even with such a large increase I am still seeing the messages in my original post.

    Thanks,
    Cody

  • Hi Cody,

     

    There's several changes required in order to get this running properly on v1.3.0 (which we unfortunately haven't updated the patches to), but judging by the log, it seems that you have that sorted out.

    Its this that is printing the error on your side: https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/bluetooth/hci/h4.c#L91

    There are some inherit drawbacks with this two-chip solution, one is that the host (nrf91) does not have a recovering mechanism if the hci controller (nrf52) resets (where it sends a BT_OP_NOP cmd). It seems like you are able to run for 17 minutes before this occurs. One drawback with the thingy:91 is that you cannot debug both MCUs simultaneously. Have you tried to look at the log from the nRF52840 (enable RTT for logging) to see if it resets mid-sequence?

     

    Kind regards,

    Håkon

  • ,

    Yeah, I have done a lot of work porting/updating the solution to v1.3.0  (based on patches provided by ). Don't read in to the 17 minute timestamps too deeply, it just so happened to be what I snagged from the RTT viewer at the time. I usually see the error messages and the behavior noted in the original post within a minute of engaging the bluetooth scan.

    I hooked up the JLink to the nRF 52840 and attempted to observe any interesting log output, but after 5-10 minutes nothing was seen on the RTT console (other than the initial log output). I can try and increase bluetooth related verbosity in the HCI UART firmware and monitor again.


    Thanks,
    Cody

  • Hi Cody,

     

    Cody said:
    I usually see the error messages and the behavior noted in the original post within a minute of engaging the bluetooth scan.

     Is there any patterns towards what causes this? A lot of advertisers in the area or an incoming connection?

      

    Cody said:
    I hooked up the JLink to the nRF 52840 and attempted to observe any interesting log output, but after 5-10 minutes nothing was seen on the RTT console (other than the initial log output). I can try and increase bluetooth related verbosity in the HCI UART firmware and monitor again.

    No indications of a reboot? What about debugging the target, setting a breakpoint in main; then resuming? If it hits the breakpoint again, it has been reset.

     

    Kind regards,

    Håkon

  • Hey ,

    Sorry for the delayed response, just finally getting back to looking into this.

    I don't see any pattern to this issue, I would say that I am in an area with a non-trivial amount of bluetooth devices advertising, I would guess around 50 devices. With that said, my use case is simple, I am only using the nRF52840 to scan for devices and read the advertising packet/scan response packet - no connections are ever being made.

    I do see indications of a reboot on the nRF52 side, if I run a simple firmware on the nRF9160 side that enables bluetooth (via HCI UART) and initiates a scan and let that run while debugging the nRF52 side, I do see the chip restarting.

    One new piece of information I have is that the HCI UART firmware from nRF Connect SDK v1.0 and v1.1 appear to be working fine, I am trying to confirm at what point the firmware stopped working (currently working on testing v1.2).

    UPDATE: Tested nRF Connect v1.2.0, HCI UART firmware still works so it was definitely broken somewhere between the official release of v1.2.0 and v1.3.0.

    Thanks,
    Cody

  • Hi Cody,

     

    Could you try adding this config on the nRF52-side, to force it not to reset if a fault occurs?

    CONFIG_RESET_ON_FATAL_ERROR=n

     

    In the case of an error, you should get a print out: but if there's no print out, you can enter debug mode and look for a stack frame similar to this:

    #3  0x0000fea0 in z_fatal_error (reason=0, esf=0x20001390 <_interrupt_stack+1984>)

    Most debuggers let you select the frame in one way or another (right-clicking in the call stack window, selecting the frame) and look into the specific locals.

    Here, the debug info is available in the "esf" pointer, with information about where the fault occurred and what the register content was at this time.

     

    Kind regards,

    Håkon

Reply
  • Hi Cody,

     

    Could you try adding this config on the nRF52-side, to force it not to reset if a fault occurs?

    CONFIG_RESET_ON_FATAL_ERROR=n

     

    In the case of an error, you should get a print out: but if there's no print out, you can enter debug mode and look for a stack frame similar to this:

    #3  0x0000fea0 in z_fatal_error (reason=0, esf=0x20001390 <_interrupt_stack+1984>)

    Most debuggers let you select the frame in one way or another (right-clicking in the call stack window, selecting the frame) and look into the specific locals.

    Here, the debug info is available in the "esf" pointer, with information about where the fault occurred and what the register content was at this time.

     

    Kind regards,

    Håkon

Children
Related