This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nrf_cli_write *very* occasionally hangs waiting for tx_rdy flag

I have a nRF52840 development kit (rev 2.0, received last week) that I'm using to develop software that'll ultimately be on a dongle. It's not especially complex - I receive packets from a custom nRF52840 board over BLE, and forward them to a PC program. I'm using a terminal emulator at the moment to log the data. Everything is *mostly* great. However, every once in a while cli_write hangs waiting for an event. "every once in a while"  is not very frequently, unfortunately. Connected to a Linux system I have seen two hangs, one at about 250K "packets" and another at 1.12M packets. Each packet has so far been <= 64 bytes, although with more time they get bigger. They are clear text packets followed by CRLF.

cli_write hangs in this piece of code:

while (p_cli->p_ctx->internal.flag.tx_rdy == 0)
{
   ;
}

which means that it's waiting for cli_transport_evt_handler() to set the flag. Presumably to indicate that the last write completed correctly.

The problem *seems* to be more frequent when the DK is connected to a Windows machine than when it connected to Linux.

I'm using the CLI interface because I need to send the nRF52840 software a command or two before it starts sending me data, after which it's a constant stream.

The data is arriving over BLE at a reasonably brisk rate. It's being *sent* over BLE from the custom board at the rate of 75 packets per second and about 50 bytes per packet.  A few bytes are added to the packet and it is forwarded over the ACM link to the PC. 

The failure on the Linux system happened at approximately 56 minutes for the 250K packet example and 4.18 hours for the 1.12M packet example. 

Knowing USB to be a touch less than 100% reliable I'm not surprised that we miss an event ever once in a while. But I'd like to recover from the missed event. Ideally it's just a missed event and not a lost packet, but the packets are sequence numbered and losing one of them is no big deal. What *is* important is to identify that the lost even has occurred and then "properly" recover from it to continue the data stream.

Any thoughts would be welcome. 


Parents
  • Also, I'm using SDK 16 - the latest downloaded from Nordic. 

    Additional information. It appears that being connected to a "busy" windows machine induces the problem most frequently. My Macbook Pro ran for 7 hours without problem. A Raspberry Pi that has occasional other tasks ran for 4 hours. Windows machines seem to be more "busy" in general. I notice that the data is coming in in a "stuttering" manner - The Apple machine and Raspberry Pi had very smooth displays on the terminal. 

    I wonder if there is a USBD timing issue with CDC ACM connections?   

  • Not sure what may be the problem no, I assume HFWC is disabled here.

    Have you followed the description of the cli_write()-api?

    "/* Function sends data stream to the CLI instance. Each time before the cli_write function is called,
    * it must be ensured that IO buffer of fprintf is flushed to avoid synchronization issues.
    * For that purpose, use function transport_buffer_flush(p_cli) */"

    Have you tried to add a down counter in the while() loop so that it may continue if it fails?

    Best regards,
    Kenneth

  • Sorry - I wasn't clear in how I'm using it. All my calls to send data to the PC use nrf_cli_fprintf which ultimately ends up calling cli_write. It's just that it's cli_write where the hangup is.

    I'm guessing you mean "HWFC" for hardware flow control. Since this is a virtual com port (CDC ACM) there isn't really any real hardware *for* flow control.  

    Also, remember that I've successfully transferred millions of packets to my Mac and Linux machines. It seems to be "busy" Windows machines where the problem mostly shows up. Although it did show up twice on a Raspberry Pi, after 250K and 1.2M packets. 

    I suspect that the "host" PC is forgetting to send an "ACK" packet on USB and the USBD software on the Nordic side just keeps waiting, rather than timing out. 

  • I assume you have made an example that combine BLE + CLI + USB, since I can't find any project in the nRF5 SDK that does itself. Can the case here be related to this issue?
    https://devzone.nordicsemi.com/f/nordic-q-a/56820/usb-endpoint-transmitter-returns-busy 

  • Absolutely. I built my own based on the CLI over CDC ACM example combined with a BLE example. 

    I will make sure there is nothing proprietary in it and attach it later today. In the meantime I'll check out the solutions offered in the ticket mentioned above. He's not seeing what I'm seeing, but at the low level they could be related. 

  • in making an example I seem to have found a way to reproduce the problem easily. Probably a Windows bug. I'll confirm and get you an example today.  

  • I'm trying to attach my demo. Two files, Nordic_CLI_bug_tx_rdy.zip and Nordic_CLI_bug_tx_rdy_device.zip

    unzip both under ....\nRF5_SDK_16.0.0_98a08e2\examples\ble_peripheral

    connect one nRF52840 DK to a JTAG pod & build and download Nordic_CLI_bug_tx_rdy_device to it. 

    connect a second nRF52840 DK and in addition connect the USB port on the side. This is for the CLI interface. 

    Start #1 running. It will start advertising over BLE

    Start #2 running. It will connect to #1

    connect teraterm (or your favorite *Windows* terminal emulator) to the virtual com port created by #2 above. 

    hit return to get the prompt and then type "connect". 

    In a short while (less than 10 minutes) it will stop printing. #2 above will be stuck in the loop:

    while (p_cli->p_ctx->internal.flag.tx_rdy == 0)
    {
       ;

    }

    Nordic_CLI_bug_tx_rdy.zip

    Nordic_CLI_bug_tx_rdy_device.zip

Reply
  • I'm trying to attach my demo. Two files, Nordic_CLI_bug_tx_rdy.zip and Nordic_CLI_bug_tx_rdy_device.zip

    unzip both under ....\nRF5_SDK_16.0.0_98a08e2\examples\ble_peripheral

    connect one nRF52840 DK to a JTAG pod & build and download Nordic_CLI_bug_tx_rdy_device to it. 

    connect a second nRF52840 DK and in addition connect the USB port on the side. This is for the CLI interface. 

    Start #1 running. It will start advertising over BLE

    Start #2 running. It will connect to #1

    connect teraterm (or your favorite *Windows* terminal emulator) to the virtual com port created by #2 above. 

    hit return to get the prompt and then type "connect". 

    In a short while (less than 10 minutes) it will stop printing. #2 above will be stuck in the loop:

    while (p_cli->p_ctx->internal.flag.tx_rdy == 0)
    {
       ;

    }

    Nordic_CLI_bug_tx_rdy.zip

    Nordic_CLI_bug_tx_rdy_device.zip

Children
No Data
Related