Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

Notification count in BLE_GATTS_EVT_HVN_TX_COMPLETE event is greater than expected

Hello,

We are facing an issue where the sum of count in BLE_GATTS_EVT_HVN_TX_COMPLETE events (member of the ble_gatts_evt_hvn_tx_complete_t structure) doesn't match with the number of BLE notifications submitted to the SoftDevice.
Here's some info on what we're using:

NRF5 SDK: v17.0.2
Softdevice: S140
MCU: NRF52840

We implemented a custom BLE submission queue similar to the BLE GATT Queue of the SDK but with some tweeks.

Here's how we are submitting notifications to the SoftDevice:

/**
 * @brief Submits a single `hvx_request` from the queue to the SoftDevice.
 *
 * @param[in] req Pointer to a `hvx_request`.
 *
 * @return @ref error_ok if the request has been successfully submitted.
 * @return @ref error_busy if the SoftDevice can't currently handle the request.
 * @return @ref error_internal otherwise.
 */
static error submit_request(const struct hvx_request *req);

static void process_queue() {
    size_t request_count = nrf_queue_utilization_get(p_queue);

    while (!is_process_ending) {
        struct hvx_request req = {0};

        nrf_queue_peek(p_queue, &req);

        error submit_status = ok;
        CRITICAL_REGION_ENTER();
        submit_status = submit_request(&req);
        request_count--;

        if (ok == submit_status) {
            nrf_atomic_u32_add(&op_count, 1);
        }

        if ((0 == request_count) || (error_busy == submit_status)) {
            SEGGER_RTT_printf(0, "Ending process with op_count: %d\n", op_count);
            nrf_atomic_flag_set(&is_process_ending);
        }
        CRITICAL_REGION_EXIT();

        if (error_busy != submit_status) {
            nrf_queue_pop(p_queue, &req);
            notify_request_processed(&req, submit_status);
        }
    }
}

Here's how we decrease the op_count variable in the context of the BLE event callback (hvx_queue_on_ble_evt is a NRF_SDH_BLE_OBSERVER callback):

void hvx_queue_on_ble_evt(const ble_evt_t *ble_evt, void *context) {
	...

    uint8_t tx_count = ble_evt->evt.gatts_evt.params.hvn_tx_complete.count;

    uint32_t old_count = nrf_atomic_u32_fetch_sub_hs(op_count, tx_count);
    SEGGER_RTT_printf(0, "Update op_count with old_count: %d, tx_count: %d\n", old_count, tx_count);
    if (old_count < tx_count) {
    	// We shouldn't end up in this condition
        nrf_atomic_u32_store(&op_count, 0);
    }

    if (is_process_ending && (0 == op_count)) {
        end_processing();
    }

	...
}

We have as a requirement, to keep track of the number of submitted notifications in order to know when they are actually sent. The is_process_ending flag is used to indicate whether all notifications have been submitted. If it's set and the op_count fall to 0, then we consider that all notifications were transmitted.

The submission of notifications to the SoftDevice are inside a critical section because we need to prevent the hvx_queue_on_ble_evt() callback to be invoked before is_process_ending is set.
To optimize the throughput, we submit notifications using sd_ble_gatts_hvx() until it returns NRF_ERROR_RESOURCES.
Here's an abstract of logs which shows the issue we're facing and it's related Wireshark capture:

Ending process with op_count: 3
Update op_count with old_count: 3, tx_count: 1 // Start of event counter N°5476
Update op_count with old_count: 2, tx_count: 1
Update op_count with old_count: 1, tx_count: 1
Ending process with op_count: 3
Update op_count with old_count: 3, tx_count: 1
Update op_count with old_count: 2, tx_count: 3 // This is the problematic line

We can see that there's 6 transmissions happening in the connection event N°5476. However, the last transmission is only ACKed by the LL in the connection event N°5477.
There's one BLE_GATTS_EVT_HVN_TX_COMPLETE event per notification except for the last 2. The last BLE_GATTS_EVT_HVN_TX_COMPLETE event contains a tx_count set to 3. This is where we face our issue since we expected tx_count = 2. Indeed, at that moment, we submitted 3 notifications and already received a BLE_GATTS_EVT_HVN_TX_COMPLETE event for the first one.

According to the Wireshark capture, there are no other notifications made on any other characteristic to explain the missmatch on tx_count.
What could be the cause of this missmatch ?

Thanks for your help.

Parents
  • Hi Joakim, 
    Have you taken into account that there can be a chance that other characteristic may send notification when you queuing your notification ? 
    From what I can see in the trace, there is one notification from handle 0x002e is sending in addition to the one at handle 0x001f which is the main one. 
    Have you count the notification on 0x002e into old_count ? Is there any other characteristic that may send notification, for example battery level ? 

  • Hello,

    Thank you for your response.

    I did take this possibility into account and I am sure that this was not caused by another transmission happening on another characteristic.

    The logs I shared in my previous post were starting at the event N°5476 whereas the Wireshark capture started at N°5475. Moreover, I intentionally removed part of them to make it more accessible. Here are the original lines:

    Ending process with self: 0x20006200, op_count: 1 // Start of event N°5475
    Update op_count with self: 0x20006200, old_count: 1, tx_count: 1
    
    Ending process with self: 0x20006310, op_count: 3 // Start of event N°5476
    Update op_count with self: 0x20006310, old_count: 3, tx_count: 1
    Update op_count with self: 0x20006310, old_count: 2, tx_count: 1
    Update op_count with self: 0x20006310, old_count: 1, tx_count: 1
    Ending process with self: 0x20006310, op_count: 3
    Update op_count with self: 0x20006310, old_count: 3, tx_count: 1
    Update op_count with self: 0x20006310, old_count: 2, tx_count: 3 // This is the problematic line

    You can see that the additional printed info are addresses in RAM. Indeed, we actually use multiple instances of BLE queues, each handling different characteristics. The instance at address 0x20006310 is exclusively sending notifications on characteristic handle 0x001f whereas the instance at 0x20006200 is responsible for the notification on characteristic handle 0x002e.

    I'm sure to have correspondence between the logs and the Wireshark capture because I attached a debugger and put a break-point at line 10 of the hvx_queue_on_ble_evt() function. Therefore no other logs or BLE transmissions were made past the issue.

  • Hi Joakim, 
    Just a quick note: the count value in BLE_GATTS_EVT_HVN_TX_COMPLETE count the number of notification packets has been Acked in the last connection event. 
    So the BLE_GATTS_EVT_HVN_TX_COMPLETE event at N°5476 will count the notification at 5475 as at that moment it's Acked. 

    I haven't looked deep into your code yet, but have you tried to have a longer test (say queuing 100 packets) and sum the totalt number of counted packets to see if they match ? 

  • Hello,

    For the sake of brevity, I only included a small portion of the logs/capture but it was actually much longer and a lot more notifications were sent.


    From the logs, I summed the submitted notifications (op_count) and the count of every BLE_GATTS_EVT_HVN_TX_COMPLETE events (tx_count). In Wireshark I applied the following filter to count the number of notifications without retransmission: (btatt.opcode == 0x1b) && !(btle.retransmit). I also did another test to have more data. Here's the results.

    op_count tx_count Wireshark
    First test 4235 4236 4235
    Second test 1001 1002 1001

    We can see that the number of notifications submitted corresponds to what is actually transmitted. Tx_count however, is wrong. I'm attaching the Wireshark capture and the logs so you can have a proper look.

    Log_hvx_queue_test_5.txt

    ble_sniff_5.pcapng

    Could there be a bug in the SoftDevice which is linked to the fact we're submitting notifications in a critical section ?

    Just a quick note: the count value in BLE_GATTS_EVT_HVN_TX_COMPLETE count the number of notification packets has been Acked in the last connection event. 
    So the BLE_GATTS_EVT_HVN_TX_COMPLETE event at N°5476 will count the notification at 5475 as at that moment it's Acked.

    That's what I had in mind. If we analyse the first Wireshark capture I shared we can see that:

    • The notification No. 17092 was sent during connection event 5476 and ACKed with No. 17093 during connection event 5476. That's what you correctly pointed out.
    • There are 6 notifications (N°17094, N°17098, N°17100, N°17102, N°17104, N°17108) sent during connection event 5476 but only the first 5 are ACKed in that connection event.
    • Connection event N°5477 starts by acknowledging notification N°17108. I assume that at that moment, the problematic BLE_GATTS_EVT_HVN_TX_COMPLETE event containing a count equal to 3 was generated since we ended up on the break-point at line 10 of the hvx_queue_on_ble_evt() function.

    Have you taken into account that there can be a chance that other characteristic may send notification when you queuing your notification ?

    There's one more thing I didn't mention before. In our firmware, we guarantee that this can't happen because our BLE queue instances are synchronized.

Reply
  • Hello,

    For the sake of brevity, I only included a small portion of the logs/capture but it was actually much longer and a lot more notifications were sent.


    From the logs, I summed the submitted notifications (op_count) and the count of every BLE_GATTS_EVT_HVN_TX_COMPLETE events (tx_count). In Wireshark I applied the following filter to count the number of notifications without retransmission: (btatt.opcode == 0x1b) && !(btle.retransmit). I also did another test to have more data. Here's the results.

    op_count tx_count Wireshark
    First test 4235 4236 4235
    Second test 1001 1002 1001

    We can see that the number of notifications submitted corresponds to what is actually transmitted. Tx_count however, is wrong. I'm attaching the Wireshark capture and the logs so you can have a proper look.

    Log_hvx_queue_test_5.txt

    ble_sniff_5.pcapng

    Could there be a bug in the SoftDevice which is linked to the fact we're submitting notifications in a critical section ?

    Just a quick note: the count value in BLE_GATTS_EVT_HVN_TX_COMPLETE count the number of notification packets has been Acked in the last connection event. 
    So the BLE_GATTS_EVT_HVN_TX_COMPLETE event at N°5476 will count the notification at 5475 as at that moment it's Acked.

    That's what I had in mind. If we analyse the first Wireshark capture I shared we can see that:

    • The notification No. 17092 was sent during connection event 5476 and ACKed with No. 17093 during connection event 5476. That's what you correctly pointed out.
    • There are 6 notifications (N°17094, N°17098, N°17100, N°17102, N°17104, N°17108) sent during connection event 5476 but only the first 5 are ACKed in that connection event.
    • Connection event N°5477 starts by acknowledging notification N°17108. I assume that at that moment, the problematic BLE_GATTS_EVT_HVN_TX_COMPLETE event containing a count equal to 3 was generated since we ended up on the break-point at line 10 of the hvx_queue_on_ble_evt() function.

    Have you taken into account that there can be a chance that other characteristic may send notification when you queuing your notification ?

    There's one more thing I didn't mention before. In our firmware, we guarantee that this can't happen because our BLE queue instances are synchronized.

Children
No Data
Related