This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

HCI UART Issues on nRF Connect SDK 1.3.0

Hello Nordic Team,

I am having issues with the HCI UART firmware for the nRF52840 on the Thingy91. I have been using the HCI UART firmware from nRF Connect SDK 1.0.0 for some time but recently I ported over the changes from nRF Connect SDK 1.3.0 and I have since ran into issues.

My application is similar to the LTE BLE Gateway application in that I am using the nRF52840 as a bluetooth controller to scan for nearby bluetooth devices. I have also used the patches provided by Sigurd on the following ticket: https://devzone.nordicsemi.com/f/nordic-q-a/52689/nrf9160-lte-sensor-gateway-on-thingy-91/230300#230300 to support this on the Thingy91.

With the nRF Connect SDK v1.0.0 HCI UART firmware, I would occasionally see the following from the RTT logging on the nRF9160:

00> [00:16:23.539,947] <wrn> bt_driver: Discarding event 0x3e

but now with the nRF Connect SDK v1.3.0 HCI UART firmware, I get a bunch of these types of messages printed out (in addition to the message above):

00> [00:17:13.480,102] <err> bt_driver: Not enough space in buffer
00> [00:17:14.016,693] <err> bt_driver: Unknown H:4 type 0x49
00> [00:17:14.017,028] <err> bt_driver: Unknown H:4 type 0x0e
00> [00:17:14.017,364] <err> bt_driver: Unknown H:4 type 0x2f
00> [00:17:14.017,669] <err> bt_driver: Unknown H:4 type 0x6e
00> [00:17:14.018,005] <err> bt_driver: Unknown H:4 type 0xec
00> [00:17:14.018,341] <err> bt_driver: Unknown H:4 type 0x1e
00> [00:16:29.435,485] <wrn> bt_hci_core: Unhandled event 0x01 len 41: 0301da972fdf98fa1e02010403039afe16169afe1181820179ececd817b280e2e31edd480ac101bc04

The behavior of the firmware is also affected - usually after the "Unhandled event" message as seen above, the bluetooth scan will be provide no results for 30-60 seconds before it recovers. This differs from the nRF Connect SDK v1.0.0 HCI UART firmware, as the bluetooth scan would still provide results regardless of the "Discarded event" message.

Any ideas?

Thank you,
Cody

Parents

0 Håkon Alseth over 4 years ago

Hi,

It looks like the queue size requirements has changed for your specific scenario between releases.

According to this zephyr-issue, the fix is to increase "CONFIG_BT_HCI_TX_STACK_SIZE". What is this currently set to at your end, and could you try to increase this and see if the issue disappears?

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Cody over 4 years ago in reply to Håkon Alseth

Hello Hakon,

The configuration option "CONFIG_BT_HCI_TX_STACK_SIZE" is not directly settable by the user. Based on the configuration in my project, the nRF52840 HCI UART firmware selects a CONFIG_BT_HCI_TX_STACK_SIZE of size 640 and the nRF9160 application firmware selects a CONFIG_BT_HCI_TX_STACK_SIZE of size 512.

I was able to manually edit the Kconfig file at zephyr/subsys/bluetooth/host/Kconfig and modify this value for testing purposes to 2048 on both the nRF52840 and nRF9160 side but even with such a large increase I am still seeing the messages in my original post.

Thanks,
Cody
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Håkon Alseth over 4 years ago in reply to Cody

Hi Cody,

There's several changes required in order to get this running properly on v1.3.0 (which we unfortunately haven't updated the patches to), but judging by the log, it seems that you have that sorted out.

Its this that is printing the error on your side: https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/bluetooth/hci/h4.c#L91

There are some inherit drawbacks with this two-chip solution, one is that the host (nrf91) does not have a recovering mechanism if the hci controller (nrf52) resets (where it sends a BT_OP_NOP cmd). It seems like you are able to run for 17 minutes before this occurs. One drawback with the thingy:91 is that you cannot debug both MCUs simultaneously. Have you tried to look at the log from the nRF52840 (enable RTT for logging) to see if it resets mid-sequence?

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Cody over 4 years ago in reply to Håkon Alseth

Hakon,

Yeah, I have done a lot of work porting/updating the solution to v1.3.0 (based on patches provided by Sigurd). Don't read in to the 17 minute timestamps too deeply, it just so happened to be what I snagged from the RTT viewer at the time. I usually see the error messages and the behavior noted in the original post within a minute of engaging the bluetooth scan.

I hooked up the JLink to the nRF 52840 and attempted to observe any interesting log output, but after 5-10 minutes nothing was seen on the RTT console (other than the initial log output). I can try and increase bluetooth related verbosity in the HCI UART firmware and monitor again.

Thanks,
Cody
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Håkon Alseth over 4 years ago in reply to Cody

Hi Cody,

Cody said:
I usually see the error messages and the behavior noted in the original post within a minute of engaging the bluetooth scan.

Is there any patterns towards what causes this? A lot of advertisers in the area or an incoming connection?

Cody said:
I hooked up the JLink to the nRF 52840 and attempted to observe any interesting log output, but after 5-10 minutes nothing was seen on the RTT console (other than the initial log output). I can try and increase bluetooth related verbosity in the HCI UART firmware and monitor again.

No indications of a reboot? What about debugging the target, setting a breakpoint in main; then resuming? If it hits the breakpoint again, it has been reset.

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Cody over 4 years ago in reply to Håkon Alseth

Hey Hakon,

Sorry for the delayed response, just finally getting back to looking into this.

I don't see any pattern to this issue, I would say that I am in an area with a non-trivial amount of bluetooth devices advertising, I would guess around 50 devices. With that said, my use case is simple, I am only using the nRF52840 to scan for devices and read the advertising packet/scan response packet - no connections are ever being made.

I do see indications of a reboot on the nRF52 side, if I run a simple firmware on the nRF9160 side that enables bluetooth (via HCI UART) and initiates a scan and let that run while debugging the nRF52 side, I do see the chip restarting.

One new piece of information I have is that the HCI UART firmware from nRF Connect SDK v1.0 and v1.1 appear to be working fine, I am trying to confirm at what point the firmware stopped working (currently working on testing v1.2).

UPDATE: Tested nRF Connect v1.2.0, HCI UART firmware still works so it was definitely broken somewhere between the official release of v1.2.0 and v1.3.0.

Thanks,
Cody
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Håkon Alseth over 4 years ago in reply to Cody
Hi Cody,

Could you try adding this config on the nRF52-side, to force it not to reset if a fault occurs?

CONFIG_RESET_ON_FATAL_ERROR=n

In the case of an error, you should get a print out: but if there's no print out, you can enter debug mode and look for a stack frame similar to this:

#3 0x0000fea0 in z_fatal_error (reason=0, esf=0x20001390 <_interrupt_stack+1984>)

Most debuggers let you select the frame in one way or another (right-clicking in the call stack window, selecting the frame) and look into the specific locals.

Here, the debug info is available in the "esf" pointer, with information about where the fault occurred and what the register content was at this time.

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Håkon Alseth over 4 years ago in reply to Cody
Hi Cody,

Could you try adding this config on the nRF52-side, to force it not to reset if a fault occurs?

CONFIG_RESET_ON_FATAL_ERROR=n

In the case of an error, you should get a print out: but if there's no print out, you can enter debug mode and look for a stack frame similar to this:

#3 0x0000fea0 in z_fatal_error (reason=0, esf=0x20001390 <_interrupt_stack+1984>)

Most debuggers let you select the frame in one way or another (right-clicking in the call stack window, selecting the frame) and look into the specific locals.

Here, the debug info is available in the "esf" pointer, with information about where the fault occurred and what the register content was at this time.

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Cody over 4 years ago in reply to Håkon Alseth

Hey Hakon,

I added the suggested line of configuration to the HCI UART project. Unfortunately, I did not observe any change in behavior. This may be due to the fact that the chip is not resetting based on an error and instead, is being reset by the nRF9160. In h4.c, in the location where I am seeing the error statements, I see references to a reset_rx() function being called.

Thanks,
Cody
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Håkon Alseth over 4 years ago in reply to Cody

Hi Cody,

Cody said:
This may be due to the fact that the chip is not resetting based on an error and instead, is being reset by the nRF9160. In h4.c, in the location where I am seeing the error statements, I see references to a reset_rx() function being called.

I must apologise for the late reply. I checked around in support and in R&D, and there has been observed some issues on ncs v1.3.0 wrt. UART stability.

I can confirm that we have seen instabilities lately, and we are working on finding the root-cause. There has been work on trying to re-write the uart transport functions in this PR, but a colleague of mine has tested this just last week, but still saw some issues. If you have a older build of hci_uart that works for now, I would recommend that you stick with this one for the time being.

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Cody over 4 years ago in reply to Håkon Alseth

Thanks Hakon,

Do you have a rough time estimate regarding when UART stability will be fixed as well as the introduction of low power UART? These are both incredibly important for our product so if you can keep me in the loop that would be much appreciated. In the meantime, I will continue to use the HCI UART firmware built off of v.1.2.0.

Thanks,
Cody
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Håkon Alseth over 4 years ago in reply to Cody

Hi Cody,

We are working on addressing this HCI issue, but I do not have a timeline, unfortunately.

The issue is being tracked here: https://github.com/zephyrproject-rtos/zephyr/issues/26722

Cody said:
as well as the introduction of low power UART?

The LP uart is merged: https://github.com/nrfconnect/sdk-nrf/pull/2591

However, this is merged fairly recently, and wider scale testing is currently in progress.

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel