Dropped packets with bt_nus_send

Greetings,


I am using the peripheral_uart Bluetooth nrf sample as a base for my custom application.

I have a ble_write_thread that just receives data on a message queue and calls the bt_nus_send function the same as the sample. And I also have a thread that produces the data and loads it to the ble message queue for transmission.

Recently because of a new requirement we need to send a lot more data via BLE than previously( to the central which is a mobile phone with the nRF Connect app). This has created issues with the integrity of the data packets we are sending. When the device is close to the edge of the Bluetooth range the device starts dropping packets.

As other members have previously informed me on the Zephyr discord this could be because while trying to resend the data (because of a poor connection on the edge of the range) new data are being queued in the BLE stack's buffers and the older ones that have still not been sent are overwritten thus they are never sent ( and appear as if they were dropped ) This only happens when the device is on the edge of the Bluetooth range & the issue is that the BLE stack is still trying to resend older packets.

I have increased the CONFIG_BT_BUF_ACL_TX_COUNT=200 & CONFIG_BT_BUF_ACL_TX_SIZE=251 (which is the maximum if I am not mistaken). I have also decreased the Supervision Timeout to the minimum (100 ms) so that the device disconnects for any transmission that takes more than 100 ms but this also seems not to solve the problem. I have also decreased the rate with which we queue new data to the ble_write_thread.

Is there something I am missing on the BLE operation or anything else I could do to eliminate this issue?

Thank you in advance!

Best regards,

Stavros

  • Hi Stavros,

    Karl requested me to take over. I am looking at the information that you gave as below

    clockis said:
    But when it is on the edge of the BLE range and the connection is poor it keeps retrying to send the same message for a long time while we are still feeding/queueing new data to it(which it queues in its internal BLE stack buffers for transmission). Which at some point overwrites older data that were queued (in the internal BLE stack buffers) but not sent thus resulting in dropped packets (packets that were fed to bt_nus_send successfully but never sent to the BLE central because they got overwritten by newer data).

    I do not think that this should be possible to happen that the BLE stack internal buffers are somehow overwritten. Atleast I have not heard of this happening. 

    I am assuming that you have double checked that your application is actually queuing full data using bt_nus_send (maybe print all data queued?) and you have confirmed here that you do not receive any error while sending notification. You also confirmed that the peer sees dropped packets. So the only thing that is remaining is to see the air traffic data. Can you please see it in the sniffer, that the notification data sent to the peer has missing data (dropped packets or overwritten data within the BLE stack)? If you queued data correctly and if you are not able to see few data it in the BLE sniffer, then something is happening within the BLE stack and we can focus our debugging direction into debugging the BLE stack issue.

    But I am thinking this could be a producer and consumer problem in the application itself. Before we could dive deeper into two different directions, please provide the data sent and BLE sniffer so that we can narrow the problem to either application specific or the BLE stack specific.

  • Hello Susheel,

    I have already performed the tests from the side of the application by checking the queued data sent to the bt_nus_send function and they are complete as expected and we are not receiving any errors from bt_nus_send or other messages.

    Besides, this dropping of packets does not happen when the device is close to the mobile phone (BLE central using the nRF Connect app for Android as well as the nRD Connect app for Windows) it only happens when approaching the edge of the Bluetooth range.

    When the device is close to the BLE central the transmission is complete, and successful, and all data arrive as expected, so to me, this shows that they're queued correctly as this mechanism is not related to the BLE range/signal strength, etc.

    I am using a Zephyr MSGQ to wait for data in the ble_write_thread and this works perfectly. Queueing the data to the ble_write_thread with the MSGQ is irrelevant to the BLE stack. I don't see how this mechanism could be affected by the Bluetooth range or closeness of our device and the BLE central.

    I will however try to perform the BLE sniffing and I will inform you asap but right now I am very much constrained in time and I am not sure when I will have the chance to do this.

    clockis said:
    But when it is on the edge of the BLE range and the connection is poor it keeps retrying to send the same message for a long time while we are still feeding/queueing new data to it(which it queues in its internal BLE stack buffers for transmission). Which at some point overwrites older data that were queued (in the internal BLE stack buffers) but not sent thus resulting in dropped packets (packets that were fed to bt_nus_send successfully but never sent to the BLE central because they got overwritten by newer data).

    I do not think that this should be possible to happen that the BLE stack internal buffers are somehow overwritten. Atleast I have not heard of this happening. 

    Also the description above regarding the internal buffers being ovewritten was explained to me by a Zephyr community member on the Zephyr discord in detail, so I am just quoting their words and their explanation for it. Also I have seen it mentioned in other DevZone tickets:

     bt_nus_send takes time to execute 

     Execution time of bt_nus_send 

    Thank you very much for your support! If you require any more info please let me know ( I will be doing the sniffing tests but if you could provide some feedback based on this it would be very helpful and greatly appreciated )

    Best regards,

    Stavros

  • Hello Susheel,

    Please excuse me for double posting but if it is possible for you I would like an answer to a more general question first before you respond to my previous comment because my issue is more general as well.

    How should someone send data using bt_nus_send assuming having a producer thread that periodically generates data and I assume a consumer thread that waits for that data and calls the bt_nus_send (just like the ble_write_thread in the peripheral_uart sample)? I am assuming a consumer thread that waits on the data and feeds it to bt_nus_send is the most ideal case for an application with multiple threads that want to send data to Bluetooth (if not please suggest another way).

    Q 1. Given this what is the best way for a producer thread to pass data to this consumer thread that runs bt_nus_send? Zephyr FIFO? Zephyr Message Queue? and how should they be written as code if you could provide a very simple generic sample code?

    I am inquiring about this because I have tried using FIFOs & Message Queue and I still get dropped messages

    Q 2. Also when the device reaches close to the edge of the range and transmission slows down and ultimately disconnects because it's out of range, when I get back close and connect it again, it still transmits at a very slow speed like it's still far away, what could be causing this?

    Thank you very much for your patience and support and I look forward to hearing from you!

    Best regards,

    Stavros

  • clockis said:
    Q 1. Given this what is the best way for a producer thread to pass data to this consumer thread that runs bt_nus_send? Zephyr FIFO? Zephyr Message Queue? and how should they be written as code if you could provide a very simple generic sample code?

    I think before we even dive deeper into checking which data passing mechanism is best suited here, we first need to confirm that this is infact a producer/consumer problem that we are seeing. There is absolutely nothing wrong using the message queues, but it can matter how you are using it and how you are handling any errors you get while passing messages (due to queue being full or timedout). 

    Since you are doing this in the ble_write_thread and not in the bluetooth callbacks, I do not think this is caused by the possible blocking nature of bt_gatt_notify_cb.

    Can you help me reproduce this or give me enough code snippets both in the producer and consumer so that I can attempt to make something similar to your use case?

    What you are seeing could also happen due to application not handling errors with message queues or if there is any other race condition in the context where you get raw data and how you queue and send them in the notification. 

    clockis said:
    Also the description above regarding the internal buffers being ovewritten was explained to me by a Zephyr community member on the Zephyr discord in detail, so I am just quoting their words and their explanation for it

    Could you please provide a link to this. I am surprised that I did not hear about this, but if you provide a link to this discussion and if it is still relevant to the SDK version you are using, then I will also keep focusing on the possible bug in the BLE stack handling. For now, my focus is mostly on application.

  • Hello Susheel,

    Thank you very much for the thorough and immediate response.

    Your previous message pointed me to look at the error handling when using k_msgq_put to queue my data to the Message Queue by adding the code below:

    while (k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT) != 0) {
    	/* message queue is full: purge old data & try again */
    	k_msgq_purge(&ble_msgq);
    }

    Instead of just k_msgq_put on its own

    k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT);

    After changing this, no packets were dropped even when I was close to the edge of the BLE range and had poor signal strength (although about 4% of messages were still getting dropped but that seems like a separate issue).

    The behavior was as your described and the BLE stack indeed cannot drop messages regardless of its connection settings (supervision timeout, ACL_BUF size, etc.). So it definitely was causing the issue I was seeing as messages were not being queued correctly in the message queue and did not arrive correctly in the ble_write_thread and thus were not fed to the bt_nus_send.

    Sorry for spamming with a double post I did it just before trying the modification mentioned above so it was premature.

    Now I am facing a different issue but since this might not be entirely relevant I will open a separate ticket for that.

    Meanwhile, I am attaching some code snippets of the producer/consumer threads I have implemented and how they are used in case you can reproduce or spot any issues with my implementation.

    Producer thread:

    File: producer.c
    
    void producer_thread (void)
    {
        for (;;) {
        
            send_data();
            
            k_msleep(1000);
            
        }
    }
    
    uint8_t data[25][30]; //Data are already stored in a buffer
    
    void send_data ( void )
    {
        ble_data_t ble_data;
        
        for (i=0;i<25;i++)
        {
            //Load data to ble_data
            memcpy(ble_data.data , data[i], len[i]);
            ble_data.len = len[i];
            
            ble_transmit_data(ble_data);
        }
    }

    Consumer thread:

    File: consumer.c
    
    typedef struct ble_data_t {
        uint8_t data[30];
        uint16_t len;
    } ble_data_t;
    
    void ble_transmit_data( ble_data_t ble_data )
    {
    	while (k_msgq_put(&ble_msgq, &ble_data, K_NO_WAIT) != 0) {
    		/* message queue is full: purge old data & try again */
    		k_msgq_purge(&ble_msgq);
    	}
    }
    
    void ble_write_thread (void) //The consumer thread
    {
        ret_code_t err_code;
        
        for (;;) {
        
            /* Wait indefinitely for data to be sent over bluetooth */
    		if (k_msgq_get( &ble_msgq, &buf , K_FOREVER))
    		{
    			LOG_ERR("Failed to get data from ble_msgq");
    			continue;
    		}
    		
    		err_code = bt_nus_send(NULL, buf.data, buf.len);
    		if (err_code) {
    			LOG_WRN("Failed to send data over BLE connection");
    		}
        
        }
    }

    Thank you again for your thorough feedback it is very helpful!

    Best regards,

    Stavros

Related