This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nrf_cli_write *very* occasionally hangs waiting for tx_rdy flag

I have a nRF52840 development kit (rev 2.0, received last week) that I'm using to develop software that'll ultimately be on a dongle. It's not especially complex - I receive packets from a custom nRF52840 board over BLE, and forward them to a PC program. I'm using a terminal emulator at the moment to log the data. Everything is *mostly* great. However, every once in a while cli_write hangs waiting for an event. "every once in a while"  is not very frequently, unfortunately. Connected to a Linux system I have seen two hangs, one at about 250K "packets" and another at 1.12M packets. Each packet has so far been <= 64 bytes, although with more time they get bigger. They are clear text packets followed by CRLF.

cli_write hangs in this piece of code:

while (p_cli->p_ctx->internal.flag.tx_rdy == 0)
{
   ;
}

which means that it's waiting for cli_transport_evt_handler() to set the flag. Presumably to indicate that the last write completed correctly.

The problem *seems* to be more frequent when the DK is connected to a Windows machine than when it connected to Linux.

I'm using the CLI interface because I need to send the nRF52840 software a command or two before it starts sending me data, after which it's a constant stream.

The data is arriving over BLE at a reasonably brisk rate. It's being *sent* over BLE from the custom board at the rate of 75 packets per second and about 50 bytes per packet.  A few bytes are added to the packet and it is forwarded over the ACM link to the PC. 

The failure on the Linux system happened at approximately 56 minutes for the 250K packet example and 4.18 hours for the 1.12M packet example. 

Knowing USB to be a touch less than 100% reliable I'm not surprised that we miss an event ever once in a while. But I'd like to recover from the missed event. Ideally it's just a missed event and not a lost packet, but the packets are sequence numbered and losing one of them is no big deal. What *is* important is to identify that the lost even has occurred and then "properly" recover from it to continue the data stream.

Any thoughts would be welcome. 


  • Also, I'm using SDK 16 - the latest downloaded from Nordic. 

    Additional information. It appears that being connected to a "busy" windows machine induces the problem most frequently. My Macbook Pro ran for 7 hours without problem. A Raspberry Pi that has occasional other tasks ran for 4 hours. Windows machines seem to be more "busy" in general. I notice that the data is coming in in a "stuttering" manner - The Apple machine and Raspberry Pi had very smooth displays on the terminal. 

    I wonder if there is a USBD timing issue with CDC ACM connections?   

  • Not sure what may be the problem no, I assume HFWC is disabled here.

    Have you followed the description of the cli_write()-api?

    "/* Function sends data stream to the CLI instance. Each time before the cli_write function is called,
    * it must be ensured that IO buffer of fprintf is flushed to avoid synchronization issues.
    * For that purpose, use function transport_buffer_flush(p_cli) */"

    Have you tried to add a down counter in the while() loop so that it may continue if it fails?

    Best regards,
    Kenneth

  • Sorry - I wasn't clear in how I'm using it. All my calls to send data to the PC use nrf_cli_fprintf which ultimately ends up calling cli_write. It's just that it's cli_write where the hangup is.

    I'm guessing you mean "HWFC" for hardware flow control. Since this is a virtual com port (CDC ACM) there isn't really any real hardware *for* flow control.  

    Also, remember that I've successfully transferred millions of packets to my Mac and Linux machines. It seems to be "busy" Windows machines where the problem mostly shows up. Although it did show up twice on a Raspberry Pi, after 250K and 1.2M packets. 

    I suspect that the "host" PC is forgetting to send an "ACK" packet on USB and the USBD software on the Nordic side just keeps waiting, rather than timing out. 

  • I assume you have made an example that combine BLE + CLI + USB, since I can't find any project in the nRF5 SDK that does itself. Can the case here be related to this issue?
    https://devzone.nordicsemi.com/f/nordic-q-a/56820/usb-endpoint-transmitter-returns-busy 

  • Absolutely. I built my own based on the CLI over CDC ACM example combined with a BLE example. 

    I will make sure there is nothing proprietary in it and attach it later today. In the meantime I'll check out the solutions offered in the ticket mentioned above. He's not seeing what I'm seeing, but at the low level they could be related. 

Related