This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE APP CLI example (CLI over BLE UART): poor reliability

I'm trying to implement CLI over BLE UART for one of our company projects. I started from the experimental ble_app_cli example in the SDK. I'm using SDK 15.3.0 on the Nordic PCA10056 SDK board. I keep running into instability problems of various kinds.

In order to narrow down the problem, I decided to simply use the example as is, and see if I can reproduce the issues. For the most part, I end up facing the same problems. 

Steps to reproduce:

  • start the example in debug mode, so that I can see that's happening.
  • connect to the device using an Android BLE Terminal emulator (I use this one https://play.google.com/store/apps/details?id=de.kai_morich.serial_bluetooth_terminal&hl=en&gl=US)
  • see the CLI prompt, send led on and led off commands, it all works
  • walk away so that the smartphone loses BLE connection
  • at least 50% of the times, the PCA10056 device doesn't get the disconnect event, and the device gets into a state where the BLE stack is not communicating anymore. It's also possible to get into the same state using for example the UART functionality of the nRF Toolbox and sending bogus commands
  • I added a NRF_LOG_INFO("BLE Cli enabled");/NRF_LOG_INFO("BLE Cli disabled"); after the nrf_cli_init()/uninit for the BLE CLI in ble_evt_handler(), and I can see that the BLE cli gets initialized but never disconnects/uninits, leaving the stack connected even if the phone is disconnected

Is anyone using the CLI over BLE UART? I know I could implement some sort of watchdog, and that the example is only meant as an example, but the reliability I'm seeing so far makes the use of the CLI over the air a bit questionable. Usually the Nordic examples are a lot more robust than this

What am I doing wrong?

Parents
  • just a thought:

    Note that there is no standard for "UART over BLE" - they're all proprietary, manufacturer-specific services.

    So how do we know that this app has a reliable implementation of NUS ... ?

  • You are right, we don't... but that app works with every other BLE serial implementation. And, as long as I never lose connectivity, the app works just fine. The connectivity loss is not something the app handles anyway

    And, as I said, I can get the example in the same state by simply using the Nordic provided UART applet in nRF Toolbox. In a few cases, even just using the nRF Connect. So that Serial Terminal app is not the real problem, even assuming that is not 100% compatible

  • MMD should work on any Cortex-M (except M0).

    You need a J-Link, though; nobody else seems to support it - even ARM themselves!

    Yep. I looked into making it work for the STM32 (using JLINK), but I lost interest after a while. I also happened to have found a long dormant bug in the MMD code itself, when FP code is enabled   https://forum.segger.com/index.php/Thread/7292-SOLVED-Bug-in-Monitor-Mode-Debug-example-when-using-FP-enabled-code-variable-add/  Slight smile

    I sure am glad Nordic made it work seamlessly and included the Jlink in the SDK boards

  • You can drag'n'drop them. The easiest would be a zipped folder containing the project folder, which includes all the files you have edited here (main, sdk_config and the project file).

    There is also an option to insert->Image/video/file -> and then click "upload" (which doesn't look like a button after the last update).

    However, I will try to do the changes that you did.

    I'll try without monitor mode debugging first, because I am not familiar with it. 

    I see that when I enabled NRF_CLI_BLE_UART_CONFIG_LOG_ENABLED is set to 1, it stops logging after you connect to the device (after the lines saying that notifications are not enabled). I believe the reason for this is that the CLI, which also is a log backend, will log things in the cli_ble_uart_write() function, which is used to process the logs. This means that whenever something is logged, it is queued using NRF_LOG_INFO(), which in turn triggers cli_ble_uart_write(), which triggers the next NRF_LOG_INFO(""); and so on. Hence, since this keeps on adding to the tasks, the idle_task() in main() is no longer called. Try to set a breakpoint there, and you will see that it is not hit after the connection is entered.

    So you have 3 choices:

    1: set NRF_CLI_BLE_UART_CONFIG_LOG_ENABLED to 0,

    2: set NRF_CLI_LOG_BACKEND to 0 (this will stop all logs from being printed over CLI).

    3: Remove all calls to NRF_LOG_... in cli_ble_uart_write():

    static ret_code_t cli_ble_uart_write(nrf_cli_transport_t const * p_transport,
                                         const void *                p_data,
                                         size_t                      length,
                                         size_t *                    p_cnt)
    {
        ASSERT(p_cnt);
        nrf_cli_ble_uart_internal_t * p_instance =
                                 CONTAINER_OF(p_transport, nrf_cli_ble_uart_internal_t, transport);
        ret_code_t err_code = NRF_SUCCESS;
        if (p_instance->p_cb->service_started)
        {
            *p_cnt = length;
            err_code = nrf_ringbuf_cpy_put(p_instance->p_tx_ringbuf, p_data, p_cnt);
    
    //        NRF_LOG_INFO("Conn_handle:%d, write req:%d, buffered:%d",
    //                                                     p_instance->p_cb->conn_handle, length, *p_cnt);
    //        NRF_LOG_HEXDUMP_DEBUG(p_data, *p_cnt);
        }
        else
        {
    //        NRF_LOG_INFO("Conn_handle:%d, write req:%d. Notifications not enabled",
    //                     p_instance->p_cb->conn_handle, length);
            *p_cnt = length;
            p_instance->p_cb->handler(NRF_CLI_TRANSPORT_EVT_TX_RDY, p_instance->p_cb->p_context);
        }
        return err_code;
    }

    If you still encounter issues with the disconnection. What does the log say after the disconnect if you implement one of the three workarounds above?

    BR,

    Edvin

  • You can drag'n'drop them. The easiest would be a zipped folder containing the project folder, which includes all the files you have edited here (main, sdk_config and the project file)

    Thanks Edvin. I missed that. Glad I asked Slight smile

    Ok, I tried what you suggested, and the only suggestion that worked well enough was #3, comment out the NRF_LOG calls in cli_ble_uart_write().

    The logs don't crash anymore (good!) but I still get into the same state where no connection is accepted. Steps to repro: execute the code in debug mode, connect with nRF Toolbox UART functionality. You will see the <info> app: BLE Cli enabled message. Then walk away until the BLE signal is lost, the button at the bottom of the nRF app will say "connecting..." but nothing happens. The message <info> app: BLE Cli disabled is never reached (even putting a breakpoint there, nothing). If you try to reconnect manually, in the scan screen the Nordic_CLI device doesn't show (because the PCA10056 is in a non working state).There is no way to reconnect to the PCA10056 short of restarting it

    Unlike before, you can break into the debugger (assuming you have MMD enabled), but with no real useful information (usually it's in task_yeld())

    I'm enclosing all modified files in this ZIP. Please note that I did not follow the exact folder structure of the SDK for the nrf_cli_ble_uart.c file (which you need to manually copy into components\libraries\cli\ble_uart. MMD is enabled on that build, and completely transparent 

    ble_app_cli.zip

    Appreciate any help you can provide

  • robca said:
    button at the bottom of the nRF app will say "connecting..." but nothing happens.

     That can mean that you are too far away, but the phone can't pick up any advertisements. Try to sniff the connection. Does the nRF advertise? Does the phone send a connection request? Does the same thing happen if you try to use nRF Connect, and just unplug the connectivity DK/dongle? This was what I tested. Sorry, but it is a bit difficult for me to walk away from it, because I use Remote Desktop and the DKs are connected to my computer in the office. I tried a couple of times yesterday to just unplug the DK it was connected to (I was using nRF Connect for Desktop). That will cause the connection to time out, because the central disappears, but I never ran into any issues.

    BR,

    Edvin

  • Try to sniff the connection. Does the nRF advertise?

    Thanks Edvin. As I said, the PCA10056 is not working anymore. After losing the signal, I get back to the same desk where the PCA10056 is, and try nRF Connect and/or nRF Toolbox, and there is nothing from the PCA10056 to sniff

    I don't understand what you mean by "disconnect the DK". I use the example simply with a single PCA10056 and an Android phone to connect to it. I don't use the very complex Python-script of the test, since that adds unneeded complexity and requires more hardware

    In that example, the PCA10056 should work as a BLE UART device, and you can connect to it with any device that can establish a BLE UART connection (e.g. an Android phone using nRF Toolbox UART). 

    So the PCA10056 is started, an Android phone with nRF toolbox UART used to connect to the PCA10056, walk far enough to lose the connection, then walk back into range. At that point, the PCA10056 is dead

Reply
  • Try to sniff the connection. Does the nRF advertise?

    Thanks Edvin. As I said, the PCA10056 is not working anymore. After losing the signal, I get back to the same desk where the PCA10056 is, and try nRF Connect and/or nRF Toolbox, and there is nothing from the PCA10056 to sniff

    I don't understand what you mean by "disconnect the DK". I use the example simply with a single PCA10056 and an Android phone to connect to it. I don't use the very complex Python-script of the test, since that adds unneeded complexity and requires more hardware

    In that example, the PCA10056 should work as a BLE UART device, and you can connect to it with any device that can establish a BLE UART connection (e.g. an Android phone using nRF Toolbox UART). 

    So the PCA10056 is started, an Android phone with nRF toolbox UART used to connect to the PCA10056, walk far enough to lose the connection, then walk back into range. At that point, the PCA10056 is dead

Children
  • robca said:
    I don't understand what you mean by "disconnect the DK"

     Ok, I thought that if you had two PCA10056, you could try to use nRF Connect for Desktop with one of the DKs, and connect to the other DK running your own application. Bu "disconnecting the DK" I meant that you could unplug the DK that is running the nRF Connect FW, which would make it immediately unresponsive (similar to what you would see if you walk out of range with a phone). I am still not able to reproduce this. 

    Does this behavior depend on your sdk_config.h definitions? Does it happen if you use an unmodified SDK and the unmodified example, without any changes to the sdk_config.h file? Or does it only happen when you enable the logging from the CLI module?

    BR,

    Edvin

  • you could try to use nRF Connect for Desktop with one of the DKs, and connect to the other DK running your own application.

    Unfortunately I do not have two DKs. But in any case that would not be a realistic configuration for our scenario where a smartphone will be used to configure the nRF52 device currently under development, and a rugged Android device used to QA test and configure the final product at the end of the production line

    I tried with the example as is, no changes, compiled in Release mode, connected the first time using UART in nRF Toolbox. Same problem: as soon as the connection is lost, the PCA10056 gets into a non-working state and never recovers. Neither nRF Connect nor nRF Toolbox can see the device.

    The example turns on one of the PCA10056 leds when connected. That led never turns off when the device is in the non-responding state, which probably means that the PCA10056 thinks it's still connected and never even goes into sleep mode. That is consistent with the code never reaching the nrf_cli-uninit() inside BLE_GAP_EVT_DISCONNECTED

  • I still can't reproduce that. Can you send me:

    1: A zipped folder that contains the project file that you used to compile this application that has the issue when you disconnect. Please let me know what IDE you are using (and I assume you are still using SDK15.3.0.

    2: The hex file that you are reproducing this with.

    BR,

    Edvin

  • Edvin, I'm not sure what other ZIP file I can send you. I'm using the 15.3.0 SDK, yes. and the ZIP file I previously uploaded contains everything needed to reproduce the problem. You simply need to unzip that file over the SDK 15.3.0 example in examples\ble_peripheral\experimental\ble_app_cli, overwriting the files with the same name. In addition to that, you need to copy the nrf_cli_ble_uart.c file inside the ZIP file over the same file in the SDK in components\libraries\cli\ble_uart. The only other alternative would be to upload the whole SDK, but that's probably too big for the forum

    I'm using Segger 4.52a, with the 8.32a CPU support package, but as you can see below, that makes no difference

    Here is the HEX file compiled with Release settings, you will also need to flash the S140 softdevice in the SDK 15.3.0 to make it work. Please note that this HEX is actually identical to the SDK example, no change at all (since there is no need to change logs and other for Release, nor MMD). So this hex is pretty much the app portion of the HEX file in examples\ble_peripheral\experimental\ble_app_cli\hex (that one also has the softdevice)

    ble_app_cli_pca10056_s140.hex

    So the simplest way to reproduce the problem is simply to flash the examples\ble_peripheral\experimental\ble_app_cli\hex\ble_app_cli_pca10056_s140.hex file onto a PCA10056, connect with an Android device to NORDIC_CLI, then walk away enough to lose signal (or put the phone into a metal box), and the PCA10056 is stuck and never recovers. There's no need to use any parts of my example. So the example, as provided by Nordic, doesn't work reliably

    Just in case, my device is a Pixel 3, but I can reproduce it even with an iPhone using the iOS version of nRF Toolbox... for some reason, it  takes a bit longer to get the PCA10056 to hang with the iPhone, but it hangs nonetheless. You just need to wait 20 seconds or so once it loses the signal before getting close again to the PCA10056

    EDIT: I just tried flashing the examples\ble_peripheral\experimental\ble_app_cli\hex\ble_app_cli_pca10056_s140.hex from SDK 17.0.2 and that one seems to work as expected, and can handle loss of signal, even using the serial terminal app. Unfortunately that doesn't really help us, because we are forced to use SDK15.3.0 (to avoid having to re-validate the entire firmware) and I cannot debug the app flow to understand where it hangs in 15.3.0, and where instead it works in 17.0.2. Also17.0.2 uses a completely different S140 sofdevice, so not much I can learn from this

  • So I tested the .hex file that you sent (together with the softdevice). I tested the SDK15.3.0\examples\ble_peripheral\experimental\ble_app_cli\hex\ble_app_cli_pca10056_s140.hex file, and the unmodified example from the same SDK (both Keil and SES projects) and I still can't reproduce the issue that you report. 

    Can you please try to run a debug session, and try to recreate it. Please take a screenshot of the following:

    Make sure you get the log window, the call stack and the registers. If you don't see any logging, please try to change 

    #define NRF_FPRINTF_FLAG_AUTOMATIC_CR_ON_LF_ENABLED 1
    to
    #define NRF_FPRINTF_FLAG_AUTOMATIC_CR_ON_LF_ENABLED 0

    in sdk_config.h. 

    Please also upload a screenshot from:

    BR,

    Edvin

Related