This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

connectivity_bridge malfunctions when BLE client disconnects and reconnects

I noticed that when a BLE client disconnects and then reconnects to the, instead of receiving all data arriving over the UART from nRF9160 to nRF52840, there is a delay of several seconds after which a huge block of 2048 bytes is sent over to the BLE client. This size matches the BRIDGE_BUF_SIZE configuration. The issue is observed on both Thingy:91 using the sample provided as part of the precompiled firmware, and also on nRF9160 DK - I ported the sample to work on it, so I observe the same behaviour on both boards. I tried changing BRIDGE_BUF_SIZE, and since I observed that internally 182 bytes are used for BLE transmission, I used that value. Now parts of the NMEA data sent from the GPS sample is randomly scrambled, resulting in 40% or NMEA packets that fail basic validation and CRC.

This attached log has been recorded from nRF9160dk, using the modified connectivity_bridge, which I adapted to run on the DK.
The GPS sample has also been modified, so send a copy of the NMEA strings to its second UART, which on the modified board links to the nRF52840 chip and is forwarded to BLE (application) and USB (this log file).

The following changes have been made:
nrf/samples/nrf9160/gps/src/main.c
static void print_nmea_data(void)
{
for (int i = 0; i < nmea_string_cnt; ++i) {
printk("%s", nmea_strings[i]);
+
+ #if CONFIG_TRACING_TEST
+ TRACING_STRING("%s", nmea_strings[i]);
+ #endif
+
+ #if CONFIG_UART_BLE
+ int uart_ble_send(char * msg, uint16_t length);
+ uart_ble_send(nmea_strings[i], strlen(nmea_strings[i]));
+ #endif
}
}

nrf/applications/connectivity_bridge.nrf9160dk/src/modules/Kconfig
BRIDGE_BUF_SIZE=182

nrf/applications/connectivity_bridge.nrf9160dk/prj.conf
CONFIG_USB_DEVICE_PRODUCT="nrf9160dk UART"

Send a copy of the data being sent to BLE over the debug UART of nRF52840. The USB on nRF9160dk connects to an OpenWRT router which acts as a mobile UART to TCP bridge and is then recorded on a computer using netcat.

nrf/applications/connectivity_bridge_g/src/modules/ble_handler.c
static void bt_send_work_handler(struct k_work *work)
{
uint16_t len;
uint8_t *buf;
int err;
bool notif_disabled = false;

do {
len = ring_buf_get_claim(&ble_tx_ring_buf, &buf, nus_max_send_len);

err = bt_gatt_nus_send(current_conn, buf, len);
if (err == -EINVAL) {
notif_disabled = true;
len = 0;
} else if (err) {
len = 0;
}

+ if (len)
+ {
+ char msg[BLE_TX_BUF_SIZE];
+ memcpy(msg, buf, len);
+ msg[len] = '\0';
+ printk("%s", msg);
+ }


err = ring_buf_get_finish(&ble_tx_ring_buf, len);
if (err) {
LOG_ERR("ring_buf_get_finish: %d", err);
break;
}
} while (len != 0 && !ring_buf_is_empty(&ble_tx_ring_buf));

if (notif_disabled) {
/* Peer has not enabled notifications: don't accumulate data */
ring_buf_reset(&ble_tx_ring_buf);
}
}

The following line is not valid:
$GPGLL,4.31375,N,02445.01620,E,1,04,100.00,312.72,M,0,,*26

Both of the following lines have a valid checksum:
* the first line contains invalid data
* the second line is valid
$GPGGA,085500.23,4208.30949,N,02444.97N,02444.9766.16,217.38,M,0,,*23
$GPGGA,085500.23,4208.30949,N,02444.97660,E,1,10,1.16,217.38,M,0,,*23

Notice how a part of the packet is missing or replaced with data, from
another packet. 40% of the packets are discarded, because they:
* do not begin with $
* do not have * in position line_length - 3
* do not have a valid checksum

From the the remaining 60% of the packets which pass the above validation,
a few are discarded because they:
* have values out of range
* have missing or more than one . character in a float value
* have less or more characters per line of value than expected

In rare events, there are still packets that pass all validation, but
contain slightly invalid data, e.g. there could be a long spike on the map
when our application is drawing the path.

If BRIDGE_BUF_SIZE is reverted to its default value of 2048,
Once the BLE client disconnects and then reconnects, the connectivity bridge
starts working incorrectly and will send large blocks of 2048 bytes containg
multiple NMEA packets.

ble_handler.test.c is my attempt to add tracing code and investigate what
is wrong with the connectivity_bridge. It seems that the ble_handler is holding
all buffers, so the uart_handler has no space to store arriving data.
At some point the buffers are freed by BLE, but then UART sends another 2048 bytes
of data, which arrave at bt_send_work_handler(). It starts sending blocks of
nus_max_send_len=182 until the BLE driver chokes. At this point BLE transmission
stalls for a few seconds, and UART RX will drop some data. Next the process repeats.
I am sorry for not being able to provid more details, and this parragraph may not be
fully accurate, because the code is extremely hard to follow.

I need to make one very important note towards the Nordic SDK developers:
Please make things simple and reliable!
I cannot say if the above issue is due to the existing design patterns in Zephyr OS,
but I see that all design patters are extremely complex and hard to follow, even
for taks that are very simple to implement, such as read/write from UART or BLE,
I see that you write 400 lines of code for each!
Hence it comes to no surprise that such a complex code has bugs which are hard to
diagnose or resolve. For example, as part of my testing I needed to change the
gps sample for nRF9160dk, so that in addition to its normal output, it sends a copy
with only the NMEA strings to its second UART, which I routed to the nRF52840 chip to
forward over BLE. My expectation was that I can simply write the data to UART.
But instead of a one line solution, I was forced to dedicate 330 lines of code:
zephyr/boards/arm/nrf9160dk_nrf9160_g/uart_ble.c
with a lot of handling routines and thigs that should normally be handled by
the driver, and any decent operating system provides a simple to use interface.

Using Nordic SDK master

nrf
commit ecf5d334f1577cc7fca11ae7b5a7fb86f0ea757a

zephyr
commit 7d20f2ebf25991b2897b91275939f8d16d38513a

 4572.projects.7z

Parents
  • What tag/commit are you working on? There has been some work on the connectivity_bridge application recently that should improve its stability, could you check out the master branch or NCS v1.4.0 rc1 and see if your problems goes away? These are the commits that is of interest to you:

    If this doesn't solve your problems, please tell me and I'll investigate it more deeply, try to reproduce it and see if I can get to the bottom of it.

    Best regards,

    Simon

  • Hello Simon!

    I updated to nrf e0d62535662ed6e04730b53f5c7b92fa58cedabf, and zephyr 105a1b9e2bc48acdd1f1173ad2ef43a90c120fbe. The connectivity bridge seems to work better, and at first I thought the issue had been resolved, but after leaving my Thingy:91 to recharge during the night, in the morning I discovered that all symptoms have reappeared. I am currently using the default configuration with buffers of 2048 bytes. My iPhone application receives large packets probably of size 2048 with a long delay between each packet. Around 3% of the packets are invalid, which is way less than 40%, but that's probably because the buffer size is 2048 bytes instead of 182 bytes.

    I am sorry for the late replay. Every time I have to update the SDK, I have to adapt my environment on two computers: Visual Studio projects, macros, paths… It's a lot of work. Then I have to port my projects and test all of them to make sure they work correctly. So as you can see, it's a very time consuming process. And due to the many bugs in the SDK, I spend more time in updates then I do in development and productivity work.

    Speaking of bugs, there is another bug in the GPS. After using the nRF9160DK for a few days the GPS demo stops updating and keeps sending the same values and timestamp over and over again:

    $GPGGA,174132.86,,,,,0,,99.99,,M,0,,*3B
    $GPGLL,,,,,174132.86,V,A*49
    $GPGSA,A,1,,,,,,,,,,,,,99.99,99.99,99.99,1*2D
    $GPGSV,1,1,0,,,,,,,,,,,,,,,,,1*54
    $GPRMC,174132.86,V,,,,,,,271020,,,N,V*0D
    $GPGGA,174132.86,,,,,0,,99.99,,M,0,,*3B
    $GPGLL,,,,,174132.86,V,A*49
    $GPGSA,A,1,,,,,,,,,,,,,99.99,99.99,99.99,1*2D
    $GPGSV,1,1,0,,,,,,,,,,,,,,,,,1*54
    $GPRMC,174132.86,V,,,,,,,271020,,,N,V*0D
    $GPGGA,174132.86,,,,,0,,99.99,,M,0,,*3B
    $GPGLL,,,,,174132.86,V,A*49
    $GPGSA,A,1,,,,,,,,,,,,,99.99,99.99,99.99,1*2D
    $GPGSV,1,1,0,,,,,,,,,,,,,,,,,1*54
    $GPRMC,174132.86,V,,,,,,,271020,,,N,V*0D
    $GPGGA,174132.86,,,,,0,,99.99,,M,0,,*3B
    $GPGLL,,,,,174132.86,V,A*49
    $GPGSA,A,1,,,,,,,,,,,,,99.99,99.99,99.99,1*2D
    $GPGSV,1,1,0,,,,,,,,,,,,,,,,,1*54
    $GPRMC,174132.86,V,,,,,,,271020,,,N,V*0D

    During this time I had been working only with the nRF52840 chip. At some point I noticed that I'm receiving the same NMEA data over and over again. After powering off the board or resetting the nRF9160 chip, the data starts updating again for a few days, and then I start getting the same data repeatedly. This looks rather like a firmware issue.

    I'm not sure if I should blame Zephyr or Nordic, but the Nordic SDK is a very unreliable platform. I'm very unproductive building on top of it. This is in huge part because of me, since I am new to this platform, yet also because in order to learn a new platform one has to experiment with working examples and see how things are accomplished. Sadly although there are a lot of good things in the SDK that should make tasks simple, your team has a long road to go, in order to make it a reliable platform. I hope to see the quality improved soon!

    Since reliable grounds are a necessity in order to advance with my work, I ported our own RTOS − Euros on the nRF52840 CPU. So far this has been a very easy process. Let me know if your team is interested in a robust UART driver that can be used bare metal or integrated with any OS. DMA, interrupts, sync and async modes are fully supported. Unlike what is offered by Zephyr, the interface is very simple to use. When send and recv API is used by a task, if a wait is required, the task is put to sleep freeing CPU cycles for other tasks. The API can also run on bare metal, with interrupts disabled or within fault handlers. It just works and is very simple to use. There is only one restriction: it is not currently free software, but we can license it.

  • I just experienced the same symptoms over USB to UART link provided by the connectivity_bridge running on Thingy:91.

  • Thanks for the feedback. By the same symptoms, you mean that it will not throw away old data if you disconnect (unplug the USB)?

  • nRF9160 is constantly sending lines of 40-44 bytes sensor data, around 5-10 lines per second.
    On my computer I started receiving large packets of data, followed by a long delay. But it got resolved on its own 5 minutes later, I was just about to record the screen.

  • I reproduced it again. Here is a screen recording of Thingy:91. nRF9160 is sending data over UART to connectivity_bridge, which forwards it to USB, and the PC receives it over a COM port.

    https://youtu.be/6NEGd93ltmI

  • This seems like a different issue than the one you described earlier. The previous problem occured when the the nRF52840 disconnected BLE, which lead to loss of data, and new incoming data didn't overwrite old data. This issue seems to be that the data does not come as a continuous flow, but instead in large bulks with delays between, but no data is lost. Have I understood it correctly?

    How do I reproduce this? Can I simply program a hello world sample onto the nRF9160 (thingy:91) with while(true){printk("data data data data .... data");k_sleep(K_MSEC(100));} and the connectivity_brdige on the nrf52840 (thingy:91)? Can I reproduce it with the project you uploaded?

    Could you test the sample usb_uart_bridge, and see if this problem still occurs? This is a much simpler example without all the additional features of the connectivity_bridge application.

    Best regards,

    Simon

Reply
  • This seems like a different issue than the one you described earlier. The previous problem occured when the the nRF52840 disconnected BLE, which lead to loss of data, and new incoming data didn't overwrite old data. This issue seems to be that the data does not come as a continuous flow, but instead in large bulks with delays between, but no data is lost. Have I understood it correctly?

    How do I reproduce this? Can I simply program a hello world sample onto the nRF9160 (thingy:91) with while(true){printk("data data data data .... data");k_sleep(K_MSEC(100));} and the connectivity_brdige on the nrf52840 (thingy:91)? Can I reproduce it with the project you uploaded?

    Could you test the sample usb_uart_bridge, and see if this problem still occurs? This is a much simpler example without all the additional features of the connectivity_bridge application.

    Best regards,

    Simon

Children
  • This seems like a different issue than the one you described earlier. The previous problem occured when the the nRF52840 disconnected BLE, which lead to loss of data, and new incoming data didn't overwrite old data. This issue seems to be that the data does not come as a continuous flow, but instead in large bulks with delays between, but no data is lost. Have I understood it correctly?

    Both issues exhibit the same behaviour which you just described: the data comes in large blocks of 2KB and there is a large delay between blocks. When observed over the USB-UART in KiTTY I cannot say if data is lost or not. Only the iPhone application has the ability to verify checksum and packet integrity. The data can arrive over BLE or UART to TCP to iPhone. If you provide me with the UUID or your iPhone (send me a private message), I can send you the iPhone application. It will help you resolve the issue much faster.

    How do I reproduce this? Can I simply program a hello world sample onto the nRF9160 (thingy:91) with while(true){printk("data data data data .... data");k_sleep(K_MSEC(100));} and the connectivity_brdige on the nrf52840 (thingy:91)? Can I reproduce it with the project you uploaded?

    I use the GPS sample. Preferably from the sources I sent you earlier. And for best results you should install my iPhone application. GPS packets have a simple checksum, and the iPhone application has code to verify it and report invalid packets.

    Could you test the sample usb_uart_bridge, and see if this problem still occurs? This is a much simpler example without all the additional features of the connectivity_bridge application.

    I might be able to test this, however I've been working actively these days, I don't have much time for testing, and I often need to USE BLE or LTE connectivity.

  • I don't understand how your custom app should help me reproduce the uart-to-usb issue? Could you give me a simple step-by-step how to reproduce the usb-to-uart issue? What sample should go on the nRF52840? What sample should go the nRF9160? Should I use the Thingy:91? Should I use the nRF9160 (I can see that you have ported the connectivity_bridge to the DK)? What tag/commit should I use?

    I guess you want me to test the app in order to reproduce the uart-to-ble issue? However, didn't we "settle" on this earlier? You were going to stop using the connectiviy bridge and look into the BLE peripheral UART sample, and I would discuss with the developers how the application should be improved? Have you changed on this?

    Best regards,

    Simon

  • The NMEA client app is designed to work over BLE or TCP. I provided it to help you solve the UART to BLE issues in connectivity_bridge. It will report % of invalid packets, and this can be very helpful, since lost data over the connectivity_bridge will immediately be reported. You may use netcat to send some garbage that looks like NMEA traffic to test it. If you wish to receive UART traffic over the USB, you need to forward the UART to a TCP port, e.g.
    nc -lp 81 < /dev/ttyACM0

    You may need to configure the baud rate as well. I wrote a tool which creates a bridge between UART and TCP or WebSocket. It works on any platform, e.g. Linux, Windows, BSD, macOS, OpenWRT. Let me know if you are interested.

    On my Thingy:91, I have installed connectivity_bridge on nRF52840, and the GPS sample on nRF9160. All sources should be attached in some of my earlier messages. Then I left it running over the weekend, the following week as I was making test changes to the GPS sample, I got lucky to record it, and it disappeared within a few minutes. It's a rare bug that may happen once or twice a day. I saw it a few times. It might as well be caused by the same underlaying problem as the UART-to-BLE issue, so try to focus on that please! I was more likely to spot both issues using Thingy:91. After switching to nrf commit e0d62535662ed6e04730b53f5c7b92fa58cedabf things got much improved. You should however checkout the old nrf commit ecf5d334f1577cc7fca11ae7b5a7fb86f0ea757a, as it will let you reproduce the BLE issue from my first comment every time.

    Indeed I wanted to find a solution that does not depend on the buggy nRF Connect SDK, because my experience with it so far has proven it to be highly unreliable. However I was forced to work on other tasks, since we are already beyond all deadlines and likely to lose the project thanks to bugs in the SDK that nobody from Nordic has resolved in the last months. You and your colleagues need to make sure the SDK is more reliable! At lease the most common examples like connectivity_bridge, GPS, and LTE related must be rock solid. They are unreliable grounds to build upon. I just added NB-IoT support to the GPS sample a few days ago. After running for a few minutes, the send() function never returns. And I can't debug that, because Thingy:91 needs to be outside to get GPS signal.

  • The connectivity_bridge_thingy sample does not build out of the box with the commits you pointed to. Have I done everything correctly?

    • From  ncs/nrf I ran:
      • git checkout ecf5d334f1577cc7fca11ae7b5a7f && west update
      • The zephyr tag was then 7d20f2ebf25991b2897b91275939f8d1
    • I donwloaded "source code and binaries for nrf9160 nrf52840" from the link you shared
    • I extracted the content and
      • Put nordic.2020-11-06\nordic\master\nrf\applications\connectivity_bridge_thingy into my SDK
    • Changed (CONFIG_)BT_NUS to (CONFIG_)BT_GATT_NUS in nrf\applications\connectivity_bridge_thingy\prj.conf and nrf\applications\connectivity_bridge_thingy\src\modules\Kconfig
      • Seems like this config had changed name
    • From nrf\applications\connectivity_bridge_thingy I ran 
      • west build -b thingy91_nrf52840 -d build_thingy91_no1

    I got a lot of errors. I could probably use some time to figure this out, but I assume it should run out-of-the-box, and that I'm missing out on something.

    Best regards,

    Simon

  • Hello Simon!

    If you follow my posts, you will discover that each time I mention a new commit, the post also includes project files for that specific commit. I understand that moving between versions of the SDK is unfortunately not a smooth process. Consider this: I have created various projects, but face a bug in the SDK. I report the problem and eventually it gets resolved. I am forced to switch to another version of the SDK, and update all of my project. Then I discover another bug, or that the first problem has not been fully resolved. Repeat. Would you agree with me that I am using a very counterproductive environment? If so, please discuss this with your colleagues!

Related