This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE APP CLI example (CLI over BLE UART): poor reliability

I'm trying to implement CLI over BLE UART for one of our company projects. I started from the experimental ble_app_cli example in the SDK. I'm using SDK 15.3.0 on the Nordic PCA10056 SDK board. I keep running into instability problems of various kinds.

In order to narrow down the problem, I decided to simply use the example as is, and see if I can reproduce the issues. For the most part, I end up facing the same problems. 

Steps to reproduce:

  • start the example in debug mode, so that I can see that's happening.
  • connect to the device using an Android BLE Terminal emulator (I use this one https://play.google.com/store/apps/details?id=de.kai_morich.serial_bluetooth_terminal&hl=en&gl=US)
  • see the CLI prompt, send led on and led off commands, it all works
  • walk away so that the smartphone loses BLE connection
  • at least 50% of the times, the PCA10056 device doesn't get the disconnect event, and the device gets into a state where the BLE stack is not communicating anymore. It's also possible to get into the same state using for example the UART functionality of the nRF Toolbox and sending bogus commands
  • I added a NRF_LOG_INFO("BLE Cli enabled");/NRF_LOG_INFO("BLE Cli disabled"); after the nrf_cli_init()/uninit for the BLE CLI in ble_evt_handler(), and I can see that the BLE cli gets initialized but never disconnects/uninits, leaving the stack connected even if the phone is disconnected

Is anyone using the CLI over BLE UART? I know I could implement some sort of watchdog, and that the example is only meant as an example, but the reliability I'm seeing so far makes the use of the CLI over the air a bit questionable. Usually the Nordic examples are a lot more robust than this

What am I doing wrong?

Parents
  • just a thought:

    Note that there is no standard for "UART over BLE" - they're all proprietary, manufacturer-specific services.

    So how do we know that this app has a reliable implementation of NUS ... ?

  • You are right, we don't... but that app works with every other BLE serial implementation. And, as long as I never lose connectivity, the app works just fine. The connectivity loss is not something the app handles anyway

    And, as I said, I can get the example in the same state by simply using the Nordic provided UART applet in nRF Toolbox. In a few cases, even just using the nRF Connect. So that Serial Terminal app is not the real problem, even assuming that is not 100% compatible

  • You can drag'n'drop them. The easiest would be a zipped folder containing the project folder, which includes all the files you have edited here (main, sdk_config and the project file)

    Thanks Edvin. I missed that. Glad I asked Slight smile

    Ok, I tried what you suggested, and the only suggestion that worked well enough was #3, comment out the NRF_LOG calls in cli_ble_uart_write().

    The logs don't crash anymore (good!) but I still get into the same state where no connection is accepted. Steps to repro: execute the code in debug mode, connect with nRF Toolbox UART functionality. You will see the <info> app: BLE Cli enabled message. Then walk away until the BLE signal is lost, the button at the bottom of the nRF app will say "connecting..." but nothing happens. The message <info> app: BLE Cli disabled is never reached (even putting a breakpoint there, nothing). If you try to reconnect manually, in the scan screen the Nordic_CLI device doesn't show (because the PCA10056 is in a non working state).There is no way to reconnect to the PCA10056 short of restarting it

    Unlike before, you can break into the debugger (assuming you have MMD enabled), but with no real useful information (usually it's in task_yeld())

    I'm enclosing all modified files in this ZIP. Please note that I did not follow the exact folder structure of the SDK for the nrf_cli_ble_uart.c file (which you need to manually copy into components\libraries\cli\ble_uart. MMD is enabled on that build, and completely transparent 

    ble_app_cli.zip

    Appreciate any help you can provide

  • robca said:
    button at the bottom of the nRF app will say "connecting..." but nothing happens.

     That can mean that you are too far away, but the phone can't pick up any advertisements. Try to sniff the connection. Does the nRF advertise? Does the phone send a connection request? Does the same thing happen if you try to use nRF Connect, and just unplug the connectivity DK/dongle? This was what I tested. Sorry, but it is a bit difficult for me to walk away from it, because I use Remote Desktop and the DKs are connected to my computer in the office. I tried a couple of times yesterday to just unplug the DK it was connected to (I was using nRF Connect for Desktop). That will cause the connection to time out, because the central disappears, but I never ran into any issues.

    BR,

    Edvin

  • Try to sniff the connection. Does the nRF advertise?

    Thanks Edvin. As I said, the PCA10056 is not working anymore. After losing the signal, I get back to the same desk where the PCA10056 is, and try nRF Connect and/or nRF Toolbox, and there is nothing from the PCA10056 to sniff

    I don't understand what you mean by "disconnect the DK". I use the example simply with a single PCA10056 and an Android phone to connect to it. I don't use the very complex Python-script of the test, since that adds unneeded complexity and requires more hardware

    In that example, the PCA10056 should work as a BLE UART device, and you can connect to it with any device that can establish a BLE UART connection (e.g. an Android phone using nRF Toolbox UART). 

    So the PCA10056 is started, an Android phone with nRF toolbox UART used to connect to the PCA10056, walk far enough to lose the connection, then walk back into range. At that point, the PCA10056 is dead

  • robca said:
    I don't understand what you mean by "disconnect the DK"

     Ok, I thought that if you had two PCA10056, you could try to use nRF Connect for Desktop with one of the DKs, and connect to the other DK running your own application. Bu "disconnecting the DK" I meant that you could unplug the DK that is running the nRF Connect FW, which would make it immediately unresponsive (similar to what you would see if you walk out of range with a phone). I am still not able to reproduce this. 

    Does this behavior depend on your sdk_config.h definitions? Does it happen if you use an unmodified SDK and the unmodified example, without any changes to the sdk_config.h file? Or does it only happen when you enable the logging from the CLI module?

    BR,

    Edvin

  • you could try to use nRF Connect for Desktop with one of the DKs, and connect to the other DK running your own application.

    Unfortunately I do not have two DKs. But in any case that would not be a realistic configuration for our scenario where a smartphone will be used to configure the nRF52 device currently under development, and a rugged Android device used to QA test and configure the final product at the end of the production line

    I tried with the example as is, no changes, compiled in Release mode, connected the first time using UART in nRF Toolbox. Same problem: as soon as the connection is lost, the PCA10056 gets into a non-working state and never recovers. Neither nRF Connect nor nRF Toolbox can see the device.

    The example turns on one of the PCA10056 leds when connected. That led never turns off when the device is in the non-responding state, which probably means that the PCA10056 thinks it's still connected and never even goes into sleep mode. That is consistent with the code never reaching the nrf_cli-uninit() inside BLE_GAP_EVT_DISCONNECTED

Reply
  • you could try to use nRF Connect for Desktop with one of the DKs, and connect to the other DK running your own application.

    Unfortunately I do not have two DKs. But in any case that would not be a realistic configuration for our scenario where a smartphone will be used to configure the nRF52 device currently under development, and a rugged Android device used to QA test and configure the final product at the end of the production line

    I tried with the example as is, no changes, compiled in Release mode, connected the first time using UART in nRF Toolbox. Same problem: as soon as the connection is lost, the PCA10056 gets into a non-working state and never recovers. Neither nRF Connect nor nRF Toolbox can see the device.

    The example turns on one of the PCA10056 leds when connected. That led never turns off when the device is in the non-responding state, which probably means that the PCA10056 thinks it's still connected and never even goes into sleep mode. That is consistent with the code never reaching the nrf_cli-uninit() inside BLE_GAP_EVT_DISCONNECTED

Children
  • I still can't reproduce that. Can you send me:

    1: A zipped folder that contains the project file that you used to compile this application that has the issue when you disconnect. Please let me know what IDE you are using (and I assume you are still using SDK15.3.0.

    2: The hex file that you are reproducing this with.

    BR,

    Edvin

  • Edvin, I'm not sure what other ZIP file I can send you. I'm using the 15.3.0 SDK, yes. and the ZIP file I previously uploaded contains everything needed to reproduce the problem. You simply need to unzip that file over the SDK 15.3.0 example in examples\ble_peripheral\experimental\ble_app_cli, overwriting the files with the same name. In addition to that, you need to copy the nrf_cli_ble_uart.c file inside the ZIP file over the same file in the SDK in components\libraries\cli\ble_uart. The only other alternative would be to upload the whole SDK, but that's probably too big for the forum

    I'm using Segger 4.52a, with the 8.32a CPU support package, but as you can see below, that makes no difference

    Here is the HEX file compiled with Release settings, you will also need to flash the S140 softdevice in the SDK 15.3.0 to make it work. Please note that this HEX is actually identical to the SDK example, no change at all (since there is no need to change logs and other for Release, nor MMD). So this hex is pretty much the app portion of the HEX file in examples\ble_peripheral\experimental\ble_app_cli\hex (that one also has the softdevice)

    ble_app_cli_pca10056_s140.hex

    So the simplest way to reproduce the problem is simply to flash the examples\ble_peripheral\experimental\ble_app_cli\hex\ble_app_cli_pca10056_s140.hex file onto a PCA10056, connect with an Android device to NORDIC_CLI, then walk away enough to lose signal (or put the phone into a metal box), and the PCA10056 is stuck and never recovers. There's no need to use any parts of my example. So the example, as provided by Nordic, doesn't work reliably

    Just in case, my device is a Pixel 3, but I can reproduce it even with an iPhone using the iOS version of nRF Toolbox... for some reason, it  takes a bit longer to get the PCA10056 to hang with the iPhone, but it hangs nonetheless. You just need to wait 20 seconds or so once it loses the signal before getting close again to the PCA10056

    EDIT: I just tried flashing the examples\ble_peripheral\experimental\ble_app_cli\hex\ble_app_cli_pca10056_s140.hex from SDK 17.0.2 and that one seems to work as expected, and can handle loss of signal, even using the serial terminal app. Unfortunately that doesn't really help us, because we are forced to use SDK15.3.0 (to avoid having to re-validate the entire firmware) and I cannot debug the app flow to understand where it hangs in 15.3.0, and where instead it works in 17.0.2. Also17.0.2 uses a completely different S140 sofdevice, so not much I can learn from this

  • So I tested the .hex file that you sent (together with the softdevice). I tested the SDK15.3.0\examples\ble_peripheral\experimental\ble_app_cli\hex\ble_app_cli_pca10056_s140.hex file, and the unmodified example from the same SDK (both Keil and SES projects) and I still can't reproduce the issue that you report. 

    Can you please try to run a debug session, and try to recreate it. Please take a screenshot of the following:

    Make sure you get the log window, the call stack and the registers. If you don't see any logging, please try to change 

    #define NRF_FPRINTF_FLAG_AUTOMATIC_CR_ON_LF_ENABLED 1
    to
    #define NRF_FPRINTF_FLAG_AUTOMATIC_CR_ON_LF_ENABLED 0

    in sdk_config.h. 

    Please also upload a screenshot from:

    BR,

    Edvin

  • I'm not sure how you cannot reproduce it. Did you try with an Android phone or iPhone and losing signal by either moving away or using a metal box? Or are you using the 2 DK setup? I'm pretty sure that the BLE stack being used makes some difference, since I can see slightly different behaviors between Android and iPhone, and it's entirely possible that using an all Nordic stack might change things once more.

    Did you try to reproduce with the exact same scenario I provided? If so, can you please let me know what phone/OS version you are using, so that I can see if I can use the same here to get a reproducible scenario for you?

    I cannot get a debug session showing anything meaningful, as I already explained many times before. Once it stops working, it stops sending log updates (including the battery updates) and I can only break into the debugger, but never in the example code. It's clearly still running, but in a state where the BLE stack is not responsive to external events. I'm not sure what help would it be to send a ton of screenshots of different parts of the softdevice or other library code

  • Yes, I tried with an iPhone, using nRF Connect for iOS, and I walked away (two times) and I put it in a faraday cage (metal box) a handful of times. I used both your provided hex file and the unmodified SDK. 

     

    robca said:
    I cannot get a debug session showing anything meaningful, as I already explained many times before. Once it stops working, it stops sending log updates (including the battery updates) and I can only break into the debugger

     Can you show me a screenshot of that? Just take a screen dump of SES without doing anything after it stops reporting the battery levels, before you press anything. Perhaps it reveals something I didn't think of asking about. Maybe you see a hardfault?

    Is there anything else that can be different from your setup to mine? You are using a standard DK, right? Did you do any modifications to it? What DK version/revision is it? It says on the white sticker.

Related