Dropped packets with bt_nus_send

Greetings,


I am using the peripheral_uart Bluetooth nrf sample as a base for my custom application.

I have a ble_write_thread that just receives data on a message queue and calls the bt_nus_send function the same as the sample. And I also have a thread that produces the data and loads it to the ble message queue for transmission.

Recently because of a new requirement we need to send a lot more data via BLE than previously( to the central which is a mobile phone with the nRF Connect app). This has created issues with the integrity of the data packets we are sending. When the device is close to the edge of the Bluetooth range the device starts dropping packets.

As other members have previously informed me on the Zephyr discord this could be because while trying to resend the data (because of a poor connection on the edge of the range) new data are being queued in the BLE stack's buffers and the older ones that have still not been sent are overwritten thus they are never sent ( and appear as if they were dropped ) This only happens when the device is on the edge of the Bluetooth range & the issue is that the BLE stack is still trying to resend older packets.

I have increased the CONFIG_BT_BUF_ACL_TX_COUNT=200 & CONFIG_BT_BUF_ACL_TX_SIZE=251 (which is the maximum if I am not mistaken). I have also decreased the Supervision Timeout to the minimum (100 ms) so that the device disconnects for any transmission that takes more than 100 ms but this also seems not to solve the problem. I have also decreased the rate with which we queue new data to the ble_write_thread.

Is there something I am missing on the BLE operation or anything else I could do to eliminate this issue?

Thank you in advance!

Best regards,

Stavros

Parents
  • Hello Stavros,

    The best approach to handle this will depend on your applications requirements and constraints, could you elaborate on the requirements to your data transfer?
    How much data will you generate, and how quickly will you do so, that you will be sending to the phone?

    In any case, the best way to approach this is to implement a local error handling after the call to queue the data for sending over the NUS service, so that you can handle it in the case that the data fails to queue due to the buffer being full.
    To handle this you could for example implement a ring-buffer, which you feed into the NUS service.
    Alternatively, you could handle the data that fails to queue in another way - for instance, if it is not important for your application that every packet of data makes it across the link you could discard this data just as well. I mention both these approaches since I do not know enough about your application to know which one fits your use-case.

    Best regards,
    Karl

  • Hello Karl,

    Excuse me for the incomplete information.

    Up until the change that requires the transmission of additional data we are transferring 341 bytes/second of data separated in separate 24 messages of various lengths that are queued to the ble_write_thread ( like the one used in peripheral_uart sample) that uses a Zephyr Message Queue to feed the data to the bt_nus_send.

    As mentioned above the ble_write_thread just waits on the Zephyr Message Queue and feeds the incoming data to the bt_nus_send just like in the peripheral_uart sample.

    With the new requirement we want to ideally send 21 x 341 = 7161 bytes/second and the hard requirement is to have NO dropped messages at all.

    As I have understood after speaking with multiple Zephyr community members when called the bt_nus_send queues the data fed to it for transmission in the internal BLE stack's buffers and the BLE stack is responsible to try and send them until it succeeds or the supervision timeout expires.

    But when it is on the edge of the BLE range and the connection is poor it keeps retrying to send the same message for a long time while we are still feeding/queueing new data to it(which it queues in its internal BLE stack buffers for transmission). Which at some point overwrites older data that were queued (in the internal BLE stack buffers) but not sent thus resulting in dropped packets (packets that were fed to bt_nus_send successfully but never sent to the BLE central because they got overwritten by newer data).

    bt_nus_send never returns an error message during this behavior( so no error handling is possible ) it always returns successfully but is executing/blocking for tens of milliseconds when this dropping of the packets happens(I am observing the execution time of the bt_nus_send (in the ble_write_thread.

    I have tried using the maximum number of BLE stack's buffers (CONFIG_BT_BUF_ACL_TX_COUNT=255) and the minimum supervision timeout (100 ms) so that the BLE stack can queue/buffer more data as well as disconnect when the transmission duration is too long but I still get some dropped packets when the device is in the edge of its range.

    Best regards,

    Stavros

  • Hello Susheel,

    Thank you very much for the thorough and immediate response.

    Your previous message pointed me to look at the error handling when using k_msgq_put to queue my data to the Message Queue by adding the code below:

    while (k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT) != 0) {
    	/* message queue is full: purge old data & try again */
    	k_msgq_purge(&ble_msgq);
    }

    Instead of just k_msgq_put on its own

    k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT);

    After changing this, no packets were dropped even when I was close to the edge of the BLE range and had poor signal strength (although about 4% of messages were still getting dropped but that seems like a separate issue).

    The behavior was as your described and the BLE stack indeed cannot drop messages regardless of its connection settings (supervision timeout, ACL_BUF size, etc.). So it definitely was causing the issue I was seeing as messages were not being queued correctly in the message queue and did not arrive correctly in the ble_write_thread and thus were not fed to the bt_nus_send.

    Sorry for spamming with a double post I did it just before trying the modification mentioned above so it was premature.

    Now I am facing a different issue but since this might not be entirely relevant I will open a separate ticket for that.

    Meanwhile, I am attaching some code snippets of the producer/consumer threads I have implemented and how they are used in case you can reproduce or spot any issues with my implementation.

    Producer thread:

    File: producer.c
    
    void producer_thread (void)
    {
        for (;;) {
        
            send_data();
            
            k_msleep(1000);
            
        }
    }
    
    uint8_t data[25][30]; //Data are already stored in a buffer
    
    void send_data ( void )
    {
        ble_data_t ble_data;
        
        for (i=0;i<25;i++)
        {
            //Load data to ble_data
            memcpy(ble_data.data , data[i], len[i]);
            ble_data.len = len[i];
            
            ble_transmit_data(ble_data);
        }
    }

    Consumer thread:

    File: consumer.c
    
    typedef struct ble_data_t {
        uint8_t data[30];
        uint16_t len;
    } ble_data_t;
    
    void ble_transmit_data( ble_data_t ble_data )
    {
    	while (k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT) != 0) {
    		/* message queue is full: purge old data & try again */
    		k_msgq_purge(&ble_msgq);
    	}
    }
    
    void ble_write_thread (void) //The consumer thread
    {
        ret_code_t err_code;
        
        for (;;) {
        
            /* Wait indefinitely for data to be sent over bluetooth */
    		if (k_msgq_get( &ble_msgq, &buf , K_FOREVER))
    		{
    			LOG_ERR("Failed to get data from ble_msgq");
    			continue;
    		}
    		
    		err_code = bt_nus_send(NULL, buf.data, buf.len);
    		if (err_code) {
    			LOG_WRN("Failed to send data over BLE connection");
    		}
        
        }
    }

    Thank you again for your thorough feedback it is very helpful!

    Best regards,

    Stavros

  • purging a message queue might cause data loss in your case, when the message queue buffer is not empty. Why do you want to purge the message queue? I would suggest something like below

    void ble_transmit_data( ble_data_t ble_data )
    {
        uint8_t retry_count = 0;
        
    	while ((k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT) != 0) && (retry_count++ < 10)) {
    		/* Sleep for sometime. How long, is application specific */
    		k_msleep(10);
    	}
    	
    	if(retry_count == 10) { LOG_WRN("Failed to add data to message queue"); }
    }

  • Because the Zephyr documentation suggests this implementation and also I do not want to make the producer threads wait.
    I will try this though.

    Wouldnt this result in an endless loop if the device disconnects while the producer thread is stuck in this loop? Something like this would also cause the watchdog of the producer thread to trigger.

    I just tried it and it makes the device unresponsive when it is on the edge of the Bluetooth range and it just freezes and stops transmitting, and after disconnecting the ble_write_thread and producer thread report 60 - 120 seconds of execution ( I am profiling the execution of the bt_nus_send and the producer thread )

  • Actually, after a few modifications I have implemented this in my application and it seems to eliminate any further dropped packets.

    Thnk you very much for the support if you have any other comments for improvements or any other thoughts you have on this it would be greatly appreciated, otherwise I will be closing this ticket on Monday that I will have finalized my implementation

  • Yes, you are right, it can end in an endless loop. I will edit the code snippet  to exit after 10 attempts.

Reply Children
No Data
Related