Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

Notification count in BLE_GATTS_EVT_HVN_TX_COMPLETE event is greater than expected

Hello,

We are facing an issue where the sum of count in BLE_GATTS_EVT_HVN_TX_COMPLETE events (member of the ble_gatts_evt_hvn_tx_complete_t structure) doesn't match with the number of BLE notifications submitted to the SoftDevice.
Here's some info on what we're using:

NRF5 SDK: v17.0.2
Softdevice: S140
MCU: NRF52840

We implemented a custom BLE submission queue similar to the BLE GATT Queue of the SDK but with some tweeks.

Here's how we are submitting notifications to the SoftDevice:

/**
 * @brief Submits a single `hvx_request` from the queue to the SoftDevice.
 *
 * @param[in] req Pointer to a `hvx_request`.
 *
 * @return @ref error_ok if the request has been successfully submitted.
 * @return @ref error_busy if the SoftDevice can't currently handle the request.
 * @return @ref error_internal otherwise.
 */
static error submit_request(const struct hvx_request *req);

static void process_queue() {
    size_t request_count = nrf_queue_utilization_get(p_queue);

    while (!is_process_ending) {
        struct hvx_request req = {0};

        nrf_queue_peek(p_queue, &req);

        error submit_status = ok;
        CRITICAL_REGION_ENTER();
        submit_status = submit_request(&req);
        request_count--;

        if (ok == submit_status) {
            nrf_atomic_u32_add(&op_count, 1);
        }

        if ((0 == request_count) || (error_busy == submit_status)) {
            SEGGER_RTT_printf(0, "Ending process with op_count: %d\n", op_count);
            nrf_atomic_flag_set(&is_process_ending);
        }
        CRITICAL_REGION_EXIT();

        if (error_busy != submit_status) {
            nrf_queue_pop(p_queue, &req);
            notify_request_processed(&req, submit_status);
        }
    }
}

Here's how we decrease the op_count variable in the context of the BLE event callback (hvx_queue_on_ble_evt is a NRF_SDH_BLE_OBSERVER callback):

void hvx_queue_on_ble_evt(const ble_evt_t *ble_evt, void *context) {
	...

    uint8_t tx_count = ble_evt->evt.gatts_evt.params.hvn_tx_complete.count;

    uint32_t old_count = nrf_atomic_u32_fetch_sub_hs(op_count, tx_count);
    SEGGER_RTT_printf(0, "Update op_count with old_count: %d, tx_count: %d\n", old_count, tx_count);
    if (old_count < tx_count) {
    	// We shouldn't end up in this condition
        nrf_atomic_u32_store(&op_count, 0);
    }

    if (is_process_ending && (0 == op_count)) {
        end_processing();
    }

	...
}

We have as a requirement, to keep track of the number of submitted notifications in order to know when they are actually sent. The is_process_ending flag is used to indicate whether all notifications have been submitted. If it's set and the op_count fall to 0, then we consider that all notifications were transmitted.

The submission of notifications to the SoftDevice are inside a critical section because we need to prevent the hvx_queue_on_ble_evt() callback to be invoked before is_process_ending is set.
To optimize the throughput, we submit notifications using sd_ble_gatts_hvx() until it returns NRF_ERROR_RESOURCES.
Here's an abstract of logs which shows the issue we're facing and it's related Wireshark capture:

Ending process with op_count: 3
Update op_count with old_count: 3, tx_count: 1 // Start of event counter N°5476
Update op_count with old_count: 2, tx_count: 1
Update op_count with old_count: 1, tx_count: 1
Ending process with op_count: 3
Update op_count with old_count: 3, tx_count: 1
Update op_count with old_count: 2, tx_count: 3 // This is the problematic line

We can see that there's 6 transmissions happening in the connection event N°5476. However, the last transmission is only ACKed by the LL in the connection event N°5477.
There's one BLE_GATTS_EVT_HVN_TX_COMPLETE event per notification except for the last 2. The last BLE_GATTS_EVT_HVN_TX_COMPLETE event contains a tx_count set to 3. This is where we face our issue since we expected tx_count = 2. Indeed, at that moment, we submitted 3 notifications and already received a BLE_GATTS_EVT_HVN_TX_COMPLETE event for the first one.

According to the Wireshark capture, there are no other notifications made on any other characteristic to explain the missmatch on tx_count.
What could be the cause of this missmatch ?

Thanks for your help.

Related