This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE APP CLI example (CLI over BLE UART): poor reliability

I'm trying to implement CLI over BLE UART for one of our company projects. I started from the experimental ble_app_cli example in the SDK. I'm using SDK 15.3.0 on the Nordic PCA10056 SDK board. I keep running into instability problems of various kinds.

In order to narrow down the problem, I decided to simply use the example as is, and see if I can reproduce the issues. For the most part, I end up facing the same problems. 

Steps to reproduce:

  • start the example in debug mode, so that I can see that's happening.
  • connect to the device using an Android BLE Terminal emulator (I use this one https://play.google.com/store/apps/details?id=de.kai_morich.serial_bluetooth_terminal&hl=en&gl=US)
  • see the CLI prompt, send led on and led off commands, it all works
  • walk away so that the smartphone loses BLE connection
  • at least 50% of the times, the PCA10056 device doesn't get the disconnect event, and the device gets into a state where the BLE stack is not communicating anymore. It's also possible to get into the same state using for example the UART functionality of the nRF Toolbox and sending bogus commands
  • I added a NRF_LOG_INFO("BLE Cli enabled");/NRF_LOG_INFO("BLE Cli disabled"); after the nrf_cli_init()/uninit for the BLE CLI in ble_evt_handler(), and I can see that the BLE cli gets initialized but never disconnects/uninits, leaving the stack connected even if the phone is disconnected

Is anyone using the CLI over BLE UART? I know I could implement some sort of watchdog, and that the example is only meant as an example, but the reliability I'm seeing so far makes the use of the CLI over the air a bit questionable. Usually the Nordic examples are a lot more robust than this

What am I doing wrong?

Parents
  • just a thought:

    Note that there is no standard for "UART over BLE" - they're all proprietary, manufacturer-specific services.

    So how do we know that this app has a reliable implementation of NUS ... ?

  • You are right, we don't... but that app works with every other BLE serial implementation. And, as long as I never lose connectivity, the app works just fine. The connectivity loss is not something the app handles anyway

    And, as I said, I can get the example in the same state by simply using the Nordic provided UART applet in nRF Toolbox. In a few cases, even just using the nRF Connect. So that Serial Terminal app is not the real problem, even assuming that is not 100% compatible

  • I spent 2 days debugging before creating this thread, yes

    Enabling logs completely locks the device when it connects, failing somewhere inside nrf_sdh_ble_evts_poll(). The only way to recover is to "break"pause on the debugger, and the code ends in HardFault handler with no stack trace. I tried even putting a breakpoint in HardFault handler, but it never reaches it before I hit

    If I enable fewer logs, I get a lot of "Lost logs - increase log backend queue size", but everything I try to do to fix that (following various suggestions from other threads here" cause only more problems

    I tried understanding why the log engine fails, but I went into a maze that keeps leading into the Softdevice dead end. I tried adding breakpoints (using Monitor mode Debugging to keep the BLE stack alive), but even that didn't lead to any meaningful find: I always get to a point where a call into the Softdevice fails, with no extra information

    I would appreciate if you found a way to enable logs on that example that allows to understand why be_evt_handler() in main.c is never called with a BL_GAP_EVT_DISCONNECTED when a connection is lost. As I said, I added a couple of NRF_LOG_INFO(_ for connected and disconnected in that handler, disabled the other logs, and I can see that many times the disconnected event is never reached. I could not find a combination of meaningful logs to enable that doesn't lead to a hardfault in the code (which is not to say it's impossible, just that I could not figure it out). 

    I do understand it's under experimental, but considering it's been "experimental" for at least 2 years, it seems more abandoned than just experimental. In a way that's what makes me wonder how prudent would be to base our OTA configuration strategy on the BLE UART CLI, given it seems to be a non-supported component. If even an example as simple as that one fails so badly (unrecoverable without a reset), adding CLI to a much more complex project, with multiple VS UUIDs seems to be looking for problems

  • "break"pause on the debugger, and the code ends in HardFault handler

    breaking in the debugger is likely to crash the SoftDevice.

    Have you tried Monitor Mode Debug?

    devzone.nordicsemi.com/.../improved-softdevice-debugging

  • Thanks for the help, but I'm not sure you actually read what I wrote:

    "I tried adding breakpoints (using Monitor mode Debugging to keep the BLE stack alive), but even that didn't lead to any meaningful find: I always get to a point where a call into the Softdevice fails, with no extra information" 

    I always had Monitor Mode Debugging in all my tests and debug sessions. 

    The problem is very simple to reproduce: enable the Softdevice, Ble UART and Cli logs and see it freeze as soon as anything connects to the PCA10056

  • No worries, any help effort is appreciated :) yours would have been a great suggestion, had I not used it already. Discovering MMD was quite a life-changing event, to the point that I miss not having it on other microprocessors (like the STM32 when debugging USB connections, which can also time out WHILE DEBUGGING). Alas, not enough in this case

Reply Children
  • I am trying to reproduce this, but so far without any luck. 

    Have you tried adding "DEBUG" to your preprocessor definitions? Does the log say anything when it stops responding then?

    Also, can you please try to disable optimization and set a breakpoint on line 91 in app_error_weak.c (after DEBUG is defined in your preprocessor definitions). Is it hit?

    What I suspect is that the connection is being used immediately after the connection handle is set to BLE_CONN_HANDLE_INVALID. But this should trigger the error handler, which you should see in the log if DEBUG is defined.

    Best regards,

    Edvin

  • I am trying to reproduce this, but so far without any luck. 

    Thanks for the tests. Here's how to reproduce it, using SDK15.3 (I reproduced it even with the newer SDKs, but our project needs to use SDK15.3, so I hope we can work on that. Otherwise I can create repro steps for 17.0.2, but all the relevant files are identical to 15.3.0)

    Use the experimental\ble_app_cli example for PCA10056, and make the following changes to sdk_config.h

    #define BLE_NUS_CONFIG_LOG_ENABLED 1
    #define BLE_NUS_CONFIG_LOG_LEVEL 4
    #define APP_TIMER_KEEPS_RTC_ACTIVE 1  // needed for MMD
    #define NRF_FPRINTF_FLAG_AUTOMATIC_CR_ON_LF_ENABLED 0  // without this, Jlink RTT doesn't output
    #define NRF_CLI_BLE_UART_CONFIG_LOG_ENABLED 1
    #define NRF_CLI_BLE_UART_CONFIG_LOG_LEVEL 4
    #define NRF_SDH_LOG_ENABLED 1   // this should already be enabled
    #define NRF_SDH_LOG_LEVEL 4
    #define NRF_SDH_SOC_LOG_ENABLED 1   // this should already be enabled
    #define NRF_SDH_SOC_LOG_LEVEL 4

    Add the following to main.c (I include the 2 lines preceding the change and the 2 following)

        nrf_cli_ble_uart_config_t config = { .conn_handle = m_conn_handle };
    
        err_code = nrf_cli_init(&m_ble_cli, &config, true, true, NRF_LOG_SEVERITY_INFO);
        NRF_LOG_INFO("BLE Cli enabled");    // add this line
        APP_ERROR_CHECK(nrf_cli_task_create(&m_ble_cli));
    
        APP_ERROR_CHECK(err_code);
                
    [...]
    
        m_conn_handle = BLE_CONN_HANDLE_INVALID;
        (void)nrf_cli_uninit(&m_ble_cli);
        NRF_LOG_INFO("BLE Cli disabled");       // add this
        break;
    
    [...]
    
    int main(void)
    
    {
        bool erase_bonds;
    
    #if CONFIG_JLINK_MONITOR_ENABLED                            // enable MMD
        NVIC_SetPriority(DebugMonitor_IRQn, _PRIO_SD_LOW);      // MMD
    #endif                                                      // MMD
    
        core_init();

    Make the following changes to ble_app_cli_pca10056_s140.emProject to add the MMD files and to enable it. Also make sure it's debug level 3 and optimization 0. Copy the 3 MMD files under examples\ble_peripheral\experimental\ble_app_cli\pca10056\s140\ses

    [...]
    <folder Name="Segger_MMD">
      <file file_name="JLINK_MONITOR_ISR_SES.s" />
      <file file_name="JLINK_MONITOR.c" />
    </folder>
    
    [...]
    
    <configuration
    Name="Debug"
    JLinkExecuteCommand="SetMonModeDebug = 1;SetMonModeVTableAddr = 0x31000"
    c_preprocessor_definitions="DEBUG;CONFIG_JLINK_MONITOR_ENABLED"
    gcc_optimization_level="None" />

    Compile and run in debug mode. Connect to the PCA10056 with nRFConnect. You will notice that there is no log output (not even the "BLE CLI enabled" we just added.

    Change the following in sdh_config.h to ensure the logs are printed as soon as they happen

    #define NRF_LOG_BUFSIZE 2048
    #define NRF_LOG_DEFERRED 0

    You will see that as soon as you connect to the PCA10056, the program hangs

    You can then set again #define NRF_LOG_DEFERRED 1 and add a NRF_FLUSH() after the "BLE CLI enable/disabled" messages to at least monitor connection/disconnection

    NRF_LOG_INFO("BLE Cli enabled");
    NRF_LOG_FLUSH();
    
    [...]
    
    NRF_LOG_INFO("BLE Cli disabled");
    NRF_LOG_FLUSH();

    Re-execute the code in debug mode and connect to the PCA10056 with nRF Connect. You will see the "BLE CLI enabled" message. Then disconnect nRF Connect. You will not see a "BLE CLI disabled" message and the PCA10056 is now unreachable by nRF Connect until you restart it

    P.S. I cannot find a way to include files. It would be easier for you if there was a way to upload my files, so that you won't have to edit manually 

  • MMD should work on any Cortex-M (except M0).

    You need a J-Link, though; nobody else seems to support it - even ARM themselves!

    Yep. I looked into making it work for the STM32 (using JLINK), but I lost interest after a while. I also happened to have found a long dormant bug in the MMD code itself, when FP code is enabled   https://forum.segger.com/index.php/Thread/7292-SOLVED-Bug-in-Monitor-Mode-Debug-example-when-using-FP-enabled-code-variable-add/  Slight smile

    I sure am glad Nordic made it work seamlessly and included the Jlink in the SDK boards

  • You can drag'n'drop them. The easiest would be a zipped folder containing the project folder, which includes all the files you have edited here (main, sdk_config and the project file).

    There is also an option to insert->Image/video/file -> and then click "upload" (which doesn't look like a button after the last update).

    However, I will try to do the changes that you did.

    I'll try without monitor mode debugging first, because I am not familiar with it. 

    I see that when I enabled NRF_CLI_BLE_UART_CONFIG_LOG_ENABLED is set to 1, it stops logging after you connect to the device (after the lines saying that notifications are not enabled). I believe the reason for this is that the CLI, which also is a log backend, will log things in the cli_ble_uart_write() function, which is used to process the logs. This means that whenever something is logged, it is queued using NRF_LOG_INFO(), which in turn triggers cli_ble_uart_write(), which triggers the next NRF_LOG_INFO(""); and so on. Hence, since this keeps on adding to the tasks, the idle_task() in main() is no longer called. Try to set a breakpoint there, and you will see that it is not hit after the connection is entered.

    So you have 3 choices:

    1: set NRF_CLI_BLE_UART_CONFIG_LOG_ENABLED to 0,

    2: set NRF_CLI_LOG_BACKEND to 0 (this will stop all logs from being printed over CLI).

    3: Remove all calls to NRF_LOG_... in cli_ble_uart_write():

    static ret_code_t cli_ble_uart_write(nrf_cli_transport_t const * p_transport,
                                         const void *                p_data,
                                         size_t                      length,
                                         size_t *                    p_cnt)
    {
        ASSERT(p_cnt);
        nrf_cli_ble_uart_internal_t * p_instance =
                                 CONTAINER_OF(p_transport, nrf_cli_ble_uart_internal_t, transport);
        ret_code_t err_code = NRF_SUCCESS;
        if (p_instance->p_cb->service_started)
        {
            *p_cnt = length;
            err_code = nrf_ringbuf_cpy_put(p_instance->p_tx_ringbuf, p_data, p_cnt);
    
    //        NRF_LOG_INFO("Conn_handle:%d, write req:%d, buffered:%d",
    //                                                     p_instance->p_cb->conn_handle, length, *p_cnt);
    //        NRF_LOG_HEXDUMP_DEBUG(p_data, *p_cnt);
        }
        else
        {
    //        NRF_LOG_INFO("Conn_handle:%d, write req:%d. Notifications not enabled",
    //                     p_instance->p_cb->conn_handle, length);
            *p_cnt = length;
            p_instance->p_cb->handler(NRF_CLI_TRANSPORT_EVT_TX_RDY, p_instance->p_cb->p_context);
        }
        return err_code;
    }

    If you still encounter issues with the disconnection. What does the log say after the disconnect if you implement one of the three workarounds above?

    BR,

    Edvin

Related