nrf_cli_write *very* occasionally hangs waiting for tx_rdy flag

Question

I have a nRF52840 development kit (rev 2.0, received last week) that I'm using to develop software that'll ultimately be on a dongle. It's not especially complex - I receive packets from a custom nRF52840 board over BLE, and forward them to a PC program. I'm using a terminal emulator at the moment to log the data. Everything is *mostly* great. However, every once in a while cli_write hangs waiting for an event. "every once in a while" is not very frequently, unfortunately. Connected to a Linux system I have seen two hangs, one at about 250K "packets" and another at 1.12M packets. Each packet has so far been <= 64 bytes, although with more time they get bigger. They are clear text packets followed by CRLF. 
 cli_write hangs in this piece of code: 
 while (p_cli->p_ctx->internal.flag.tx_rdy == 0) { ; } 
 which means that it's waiting for cli_transport_evt_handler() to set the flag. Presumably to indicate that the last write completed correctly. 
 The problem *seems* to be more frequent when the DK is connected to a Windows machine than when it connected to Linux. 
 I'm using the CLI interface because I need to send the nRF52840 software a command or two before it starts sending me data, after which it's a constant stream. 
 The data is arriving over BLE at a reasonably brisk rate. It's being *sent* over BLE from the custom board at the rate of 75 packets per second and about 50 bytes per packet. A few bytes are added to the packet and it is forwarded over the ACM link to the PC. 
 The failure on the Linux system happened at approximately 56 minutes for the 250K packet example and 4.18 hours for the 1.12M packet example. 
 Knowing USB to be a touch less than 100% reliable I'm not surprised that we miss an event ever once in a while. But I'd like to recover from the missed event. Ideally it's just a missed event and not a lost packet, but the packets are sequence numbered and losing one of them is no big deal. What *is* important is to identify that the lost even has occurred and then "properly" recover from it to continue the data stream. 
 Any thoughts would be welcome.

Rob Philip · Accepted Answer

The problem was that I was an idiot. Somehow I forgot that the BLE callback was running at interrupt level and did all my CLI handling in the callback. I have resolved the problem by enqueuing the incoming data and handling it in the "main" loop. Attaching a modified project in case anyone else finds this ticket and wants to see something that works. 
 Feel free to close this ticket as a "user error". 
 2248.Nordic_CLI_bug_tx_rdy.zip

nrf_cli_write very occasionally hangs waiting for tx_rdy flag

Top Replies