Dropped packets with bt_nus_send

Greetings,


I am using the peripheral_uart Bluetooth nrf sample as a base for my custom application.

I have a ble_write_thread that just receives data on a message queue and calls the bt_nus_send function the same as the sample. And I also have a thread that produces the data and loads it to the ble message queue for transmission.

Recently because of a new requirement we need to send a lot more data via BLE than previously( to the central which is a mobile phone with the nRF Connect app). This has created issues with the integrity of the data packets we are sending. When the device is close to the edge of the Bluetooth range the device starts dropping packets.

As other members have previously informed me on the Zephyr discord this could be because while trying to resend the data (because of a poor connection on the edge of the range) new data are being queued in the BLE stack's buffers and the older ones that have still not been sent are overwritten thus they are never sent ( and appear as if they were dropped ) This only happens when the device is on the edge of the Bluetooth range & the issue is that the BLE stack is still trying to resend older packets.

I have increased the CONFIG_BT_BUF_ACL_TX_COUNT=200 & CONFIG_BT_BUF_ACL_TX_SIZE=251 (which is the maximum if I am not mistaken). I have also decreased the Supervision Timeout to the minimum (100 ms) so that the device disconnects for any transmission that takes more than 100 ms but this also seems not to solve the problem. I have also decreased the rate with which we queue new data to the ble_write_thread.

Is there something I am missing on the BLE operation or anything else I could do to eliminate this issue?

Thank you in advance!

Best regards,

Stavros

Parents
  • Hello Stavros,

    The best approach to handle this will depend on your applications requirements and constraints, could you elaborate on the requirements to your data transfer?
    How much data will you generate, and how quickly will you do so, that you will be sending to the phone?

    In any case, the best way to approach this is to implement a local error handling after the call to queue the data for sending over the NUS service, so that you can handle it in the case that the data fails to queue due to the buffer being full.
    To handle this you could for example implement a ring-buffer, which you feed into the NUS service.
    Alternatively, you could handle the data that fails to queue in another way - for instance, if it is not important for your application that every packet of data makes it across the link you could discard this data just as well. I mention both these approaches since I do not know enough about your application to know which one fits your use-case.

    Best regards,
    Karl

  • Hello Karl,

    Excuse me for the incomplete information.

    Up until the change that requires the transmission of additional data we are transferring 341 bytes/second of data separated in separate 24 messages of various lengths that are queued to the ble_write_thread ( like the one used in peripheral_uart sample) that uses a Zephyr Message Queue to feed the data to the bt_nus_send.

    As mentioned above the ble_write_thread just waits on the Zephyr Message Queue and feeds the incoming data to the bt_nus_send just like in the peripheral_uart sample.

    With the new requirement we want to ideally send 21 x 341 = 7161 bytes/second and the hard requirement is to have NO dropped messages at all.

    As I have understood after speaking with multiple Zephyr community members when called the bt_nus_send queues the data fed to it for transmission in the internal BLE stack's buffers and the BLE stack is responsible to try and send them until it succeeds or the supervision timeout expires.

    But when it is on the edge of the BLE range and the connection is poor it keeps retrying to send the same message for a long time while we are still feeding/queueing new data to it(which it queues in its internal BLE stack buffers for transmission). Which at some point overwrites older data that were queued (in the internal BLE stack buffers) but not sent thus resulting in dropped packets (packets that were fed to bt_nus_send successfully but never sent to the BLE central because they got overwritten by newer data).

    bt_nus_send never returns an error message during this behavior( so no error handling is possible ) it always returns successfully but is executing/blocking for tens of milliseconds when this dropping of the packets happens(I am observing the execution time of the bt_nus_send (in the ble_write_thread.

    I have tried using the maximum number of BLE stack's buffers (CONFIG_BT_BUF_ACL_TX_COUNT=255) and the minimum supervision timeout (100 ms) so that the BLE stack can queue/buffer more data as well as disconnect when the transmission duration is too long but I still get some dropped packets when the device is in the edge of its range.

    Best regards,

    Stavros

  • Hello Susheel,

    Please excuse me for double posting but if it is possible for you I would like an answer to a more general question first before you respond to my previous comment because my issue is more general as well.

    How should someone send data using bt_nus_send assuming having a producer thread that periodically generates data and I assume a consumer thread that waits for that data and calls the bt_nus_send (just like the ble_write_thread in the peripheral_uart sample)? I am assuming a consumer thread that waits on the data and feeds it to bt_nus_send is the most ideal case for an application with multiple threads that want to send data to Bluetooth (if not please suggest another way).

    Q 1. Given this what is the best way for a producer thread to pass data to this consumer thread that runs bt_nus_send? Zephyr FIFO? Zephyr Message Queue? and how should they be written as code if you could provide a very simple generic sample code?

    I am inquiring about this because I have tried using FIFOs & Message Queue and I still get dropped messages

    Q 2. Also when the device reaches close to the edge of the range and transmission slows down and ultimately disconnects because it's out of range, when I get back close and connect it again, it still transmits at a very slow speed like it's still far away, what could be causing this?

    Thank you very much for your patience and support and I look forward to hearing from you!

    Best regards,

    Stavros

  • clockis said:
    Q 1. Given this what is the best way for a producer thread to pass data to this consumer thread that runs bt_nus_send? Zephyr FIFO? Zephyr Message Queue? and how should they be written as code if you could provide a very simple generic sample code?

    I think before we even dive deeper into checking which data passing mechanism is best suited here, we first need to confirm that this is infact a producer/consumer problem that we are seeing. There is absolutely nothing wrong using the message queues, but it can matter how you are using it and how you are handling any errors you get while passing messages (due to queue being full or timedout). 

    Since you are doing this in the ble_write_thread and not in the bluetooth callbacks, I do not think this is caused by the possible blocking nature of bt_gatt_notify_cb.

    Can you help me reproduce this or give me enough code snippets both in the producer and consumer so that I can attempt to make something similar to your use case?

    What you are seeing could also happen due to application not handling errors with message queues or if there is any other race condition in the context where you get raw data and how you queue and send them in the notification. 

    clockis said:
    Also the description above regarding the internal buffers being ovewritten was explained to me by a Zephyr community member on the Zephyr discord in detail, so I am just quoting their words and their explanation for it

    Could you please provide a link to this. I am surprised that I did not hear about this, but if you provide a link to this discussion and if it is still relevant to the SDK version you are using, then I will also keep focusing on the possible bug in the BLE stack handling. For now, my focus is mostly on application.

  • Hello Susheel,

    Thank you very much for the thorough and immediate response.

    Your previous message pointed me to look at the error handling when using k_msgq_put to queue my data to the Message Queue by adding the code below:

    while (k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT) != 0) {
    	/* message queue is full: purge old data & try again */
    	k_msgq_purge(&ble_msgq);
    }

    Instead of just k_msgq_put on its own

    k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT);

    After changing this, no packets were dropped even when I was close to the edge of the BLE range and had poor signal strength (although about 4% of messages were still getting dropped but that seems like a separate issue).

    The behavior was as your described and the BLE stack indeed cannot drop messages regardless of its connection settings (supervision timeout, ACL_BUF size, etc.). So it definitely was causing the issue I was seeing as messages were not being queued correctly in the message queue and did not arrive correctly in the ble_write_thread and thus were not fed to the bt_nus_send.

    Sorry for spamming with a double post I did it just before trying the modification mentioned above so it was premature.

    Now I am facing a different issue but since this might not be entirely relevant I will open a separate ticket for that.

    Meanwhile, I am attaching some code snippets of the producer/consumer threads I have implemented and how they are used in case you can reproduce or spot any issues with my implementation.

    Producer thread:

    File: producer.c
    
    void producer_thread (void)
    {
        for (;;) {
        
            send_data();
            
            k_msleep(1000);
            
        }
    }
    
    uint8_t data[25][30]; //Data are already stored in a buffer
    
    void send_data ( void )
    {
        ble_data_t ble_data;
        
        for (i=0;i<25;i++)
        {
            //Load data to ble_data
            memcpy(ble_data.data , data[i], len[i]);
            ble_data.len = len[i];
            
            ble_transmit_data(ble_data);
        }
    }

    Consumer thread:

    File: consumer.c
    
    typedef struct ble_data_t {
        uint8_t data[30];
        uint16_t len;
    } ble_data_t;
    
    void ble_transmit_data( ble_data_t ble_data )
    {
    	while (k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT) != 0) {
    		/* message queue is full: purge old data & try again */
    		k_msgq_purge(&ble_msgq);
    	}
    }
    
    void ble_write_thread (void) //The consumer thread
    {
        ret_code_t err_code;
        
        for (;;) {
        
            /* Wait indefinitely for data to be sent over bluetooth */
    		if (k_msgq_get( &ble_msgq, &buf , K_FOREVER))
    		{
    			LOG_ERR("Failed to get data from ble_msgq");
    			continue;
    		}
    		
    		err_code = bt_nus_send(NULL, buf.data, buf.len);
    		if (err_code) {
    			LOG_WRN("Failed to send data over BLE connection");
    		}
        
        }
    }

    Thank you again for your thorough feedback it is very helpful!

    Best regards,

    Stavros

  • purging a message queue might cause data loss in your case, when the message queue buffer is not empty. Why do you want to purge the message queue? I would suggest something like below

    void ble_transmit_data( ble_data_t ble_data )
    {
        uint8_t retry_count = 0;
        
    	while ((k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT) != 0) && (retry_count++ < 10)) {
    		/* Sleep for sometime. How long, is application specific */
    		k_msleep(10);
    	}
    	
    	if(retry_count == 10) { LOG_WRN("Failed to add data to message queue"); }
    }

  • Because the Zephyr documentation suggests this implementation and also I do not want to make the producer threads wait.
    I will try this though.

    Wouldnt this result in an endless loop if the device disconnects while the producer thread is stuck in this loop? Something like this would also cause the watchdog of the producer thread to trigger.

    I just tried it and it makes the device unresponsive when it is on the edge of the Bluetooth range and it just freezes and stops transmitting, and after disconnecting the ble_write_thread and producer thread report 60 - 120 seconds of execution ( I am profiling the execution of the bt_nus_send and the producer thread )

Reply
  • Because the Zephyr documentation suggests this implementation and also I do not want to make the producer threads wait.
    I will try this though.

    Wouldnt this result in an endless loop if the device disconnects while the producer thread is stuck in this loop? Something like this would also cause the watchdog of the producer thread to trigger.

    I just tried it and it makes the device unresponsive when it is on the edge of the Bluetooth range and it just freezes and stops transmitting, and after disconnecting the ble_write_thread and producer thread report 60 - 120 seconds of execution ( I am profiling the execution of the bt_nus_send and the producer thread )

Children
Related