Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

Notification count in BLE_GATTS_EVT_HVN_TX_COMPLETE event is greater than expected

Hello,

We are facing an issue where the sum of count in BLE_GATTS_EVT_HVN_TX_COMPLETE events (member of the ble_gatts_evt_hvn_tx_complete_t structure) doesn't match with the number of BLE notifications submitted to the SoftDevice.
Here's some info on what we're using:

NRF5 SDK: v17.0.2
Softdevice: S140
MCU: NRF52840

We implemented a custom BLE submission queue similar to the BLE GATT Queue of the SDK but with some tweeks.

Here's how we are submitting notifications to the SoftDevice:

/**
 * @brief Submits a single `hvx_request` from the queue to the SoftDevice.
 *
 * @param[in] req Pointer to a `hvx_request`.
 *
 * @return @ref error_ok if the request has been successfully submitted.
 * @return @ref error_busy if the SoftDevice can't currently handle the request.
 * @return @ref error_internal otherwise.
 */
static error submit_request(const struct hvx_request *req);

static void process_queue() {
    size_t request_count = nrf_queue_utilization_get(p_queue);

    while (!is_process_ending) {
        struct hvx_request req = {0};

        nrf_queue_peek(p_queue, &req);

        error submit_status = ok;
        CRITICAL_REGION_ENTER();
        submit_status = submit_request(&req);
        request_count--;

        if (ok == submit_status) {
            nrf_atomic_u32_add(&op_count, 1);
        }

        if ((0 == request_count) || (error_busy == submit_status)) {
            SEGGER_RTT_printf(0, "Ending process with op_count: %d\n", op_count);
            nrf_atomic_flag_set(&is_process_ending);
        }
        CRITICAL_REGION_EXIT();

        if (error_busy != submit_status) {
            nrf_queue_pop(p_queue, &req);
            notify_request_processed(&req, submit_status);
        }
    }
}

Here's how we decrease the op_count variable in the context of the BLE event callback (hvx_queue_on_ble_evt is a NRF_SDH_BLE_OBSERVER callback):

void hvx_queue_on_ble_evt(const ble_evt_t *ble_evt, void *context) {
	...

    uint8_t tx_count = ble_evt->evt.gatts_evt.params.hvn_tx_complete.count;

    uint32_t old_count = nrf_atomic_u32_fetch_sub_hs(op_count, tx_count);
    SEGGER_RTT_printf(0, "Update op_count with old_count: %d, tx_count: %d\n", old_count, tx_count);
    if (old_count < tx_count) {
    	// We shouldn't end up in this condition
        nrf_atomic_u32_store(&op_count, 0);
    }

    if (is_process_ending && (0 == op_count)) {
        end_processing();
    }

	...
}

We have as a requirement, to keep track of the number of submitted notifications in order to know when they are actually sent. The is_process_ending flag is used to indicate whether all notifications have been submitted. If it's set and the op_count fall to 0, then we consider that all notifications were transmitted.

The submission of notifications to the SoftDevice are inside a critical section because we need to prevent the hvx_queue_on_ble_evt() callback to be invoked before is_process_ending is set.
To optimize the throughput, we submit notifications using sd_ble_gatts_hvx() until it returns NRF_ERROR_RESOURCES.
Here's an abstract of logs which shows the issue we're facing and it's related Wireshark capture:

Ending process with op_count: 3
Update op_count with old_count: 3, tx_count: 1 // Start of event counter N°5476
Update op_count with old_count: 2, tx_count: 1
Update op_count with old_count: 1, tx_count: 1
Ending process with op_count: 3
Update op_count with old_count: 3, tx_count: 1
Update op_count with old_count: 2, tx_count: 3 // This is the problematic line

We can see that there's 6 transmissions happening in the connection event N°5476. However, the last transmission is only ACKed by the LL in the connection event N°5477.
There's one BLE_GATTS_EVT_HVN_TX_COMPLETE event per notification except for the last 2. The last BLE_GATTS_EVT_HVN_TX_COMPLETE event contains a tx_count set to 3. This is where we face our issue since we expected tx_count = 2. Indeed, at that moment, we submitted 3 notifications and already received a BLE_GATTS_EVT_HVN_TX_COMPLETE event for the first one.

According to the Wireshark capture, there are no other notifications made on any other characteristic to explain the missmatch on tx_count.
What could be the cause of this missmatch ?

Thanks for your help.

Parents
  • Hello,

    Thanks for you help.

    Could you please try to reproduce the issue with a simple application ? Maybe a strip down version of your application or a modification of our example ? 

    After discussing about this issue with our team, we decided that solving this issue is not a priority at the moment. Indeed we do not consider this as a critical bug as it doesn't seem to occur if we throttle down the notification submission speed. We will unfortunately not be able to spend more time on this.

    Why you do request_count-- without checking the status of  submit_status = submit_request(&req); ? 

    The submission loop is done at most request_count times but may be stopped depending on submit_status.

     if ((0 == request_count) || (error_busy == submit_status)) {
        SEGGER_RTT_printf(0, "Ending process with op_count: %d\n", op_count);
        nrf_atomic_flag_set(&is_process_ending);
    }

Reply
  • Hello,

    Thanks for you help.

    Could you please try to reproduce the issue with a simple application ? Maybe a strip down version of your application or a modification of our example ? 

    After discussing about this issue with our team, we decided that solving this issue is not a priority at the moment. Indeed we do not consider this as a critical bug as it doesn't seem to occur if we throttle down the notification submission speed. We will unfortunately not be able to spend more time on this.

    Why you do request_count-- without checking the status of  submit_status = submit_request(&req); ? 

    The submission loop is done at most request_count times but may be stopped depending on submit_status.

     if ((0 == request_count) || (error_busy == submit_status)) {
        SEGGER_RTT_printf(0, "Ending process with op_count: %d\n", op_count);
        nrf_atomic_flag_set(&is_process_ending);
    }

Children
Related