Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

Notification count in BLE_GATTS_EVT_HVN_TX_COMPLETE event is greater than expected

Hello,

We are facing an issue where the sum of count in BLE_GATTS_EVT_HVN_TX_COMPLETE events (member of the ble_gatts_evt_hvn_tx_complete_t structure) doesn't match with the number of BLE notifications submitted to the SoftDevice.
Here's some info on what we're using:

NRF5 SDK: v17.0.2
Softdevice: S140
MCU: NRF52840

We implemented a custom BLE submission queue similar to the BLE GATT Queue of the SDK but with some tweeks.

Here's how we are submitting notifications to the SoftDevice:

/**
 * @brief Submits a single `hvx_request` from the queue to the SoftDevice.
 *
 * @param[in] req Pointer to a `hvx_request`.
 *
 * @return @ref error_ok if the request has been successfully submitted.
 * @return @ref error_busy if the SoftDevice can't currently handle the request.
 * @return @ref error_internal otherwise.
 */
static error submit_request(const struct hvx_request *req);

static void process_queue() {
    size_t request_count = nrf_queue_utilization_get(p_queue);

    while (!is_process_ending) {
        struct hvx_request req = {0};

        nrf_queue_peek(p_queue, &req);

        error submit_status = ok;
        CRITICAL_REGION_ENTER();
        submit_status = submit_request(&req);
        request_count--;

        if (ok == submit_status) {
            nrf_atomic_u32_add(&op_count, 1);
        }

        if ((0 == request_count) || (error_busy == submit_status)) {
            SEGGER_RTT_printf(0, "Ending process with op_count: %d\n", op_count);
            nrf_atomic_flag_set(&is_process_ending);
        }
        CRITICAL_REGION_EXIT();

        if (error_busy != submit_status) {
            nrf_queue_pop(p_queue, &req);
            notify_request_processed(&req, submit_status);
        }
    }
}

Here's how we decrease the op_count variable in the context of the BLE event callback (hvx_queue_on_ble_evt is a NRF_SDH_BLE_OBSERVER callback):

void hvx_queue_on_ble_evt(const ble_evt_t *ble_evt, void *context) {
	...

    uint8_t tx_count = ble_evt->evt.gatts_evt.params.hvn_tx_complete.count;

    uint32_t old_count = nrf_atomic_u32_fetch_sub_hs(op_count, tx_count);
    SEGGER_RTT_printf(0, "Update op_count with old_count: %d, tx_count: %d\n", old_count, tx_count);
    if (old_count < tx_count) {
    	// We shouldn't end up in this condition
        nrf_atomic_u32_store(&op_count, 0);
    }

    if (is_process_ending && (0 == op_count)) {
        end_processing();
    }

	...
}

We have as a requirement, to keep track of the number of submitted notifications in order to know when they are actually sent. The is_process_ending flag is used to indicate whether all notifications have been submitted. If it's set and the op_count fall to 0, then we consider that all notifications were transmitted.

The submission of notifications to the SoftDevice are inside a critical section because we need to prevent the hvx_queue_on_ble_evt() callback to be invoked before is_process_ending is set.
To optimize the throughput, we submit notifications using sd_ble_gatts_hvx() until it returns NRF_ERROR_RESOURCES.
Here's an abstract of logs which shows the issue we're facing and it's related Wireshark capture:

Ending process with op_count: 3
Update op_count with old_count: 3, tx_count: 1 // Start of event counter N°5476
Update op_count with old_count: 2, tx_count: 1
Update op_count with old_count: 1, tx_count: 1
Ending process with op_count: 3
Update op_count with old_count: 3, tx_count: 1
Update op_count with old_count: 2, tx_count: 3 // This is the problematic line

We can see that there's 6 transmissions happening in the connection event N°5476. However, the last transmission is only ACKed by the LL in the connection event N°5477.
There's one BLE_GATTS_EVT_HVN_TX_COMPLETE event per notification except for the last 2. The last BLE_GATTS_EVT_HVN_TX_COMPLETE event contains a tx_count set to 3. This is where we face our issue since we expected tx_count = 2. Indeed, at that moment, we submitted 3 notifications and already received a BLE_GATTS_EVT_HVN_TX_COMPLETE event for the first one.

According to the Wireshark capture, there are no other notifications made on any other characteristic to explain the missmatch on tx_count.
What could be the cause of this missmatch ?

Thanks for your help.

Parents
  • Hi Joakim, 

    I agree that there could be something wrong here especially when the test run for quite some time with 4000s notifications until the issue occurred. 

    Could you please try to reproduce the issue with a simple application ? Maybe a strip down version of your application or a modification of our example ? 

    If I can reproduce the issue here it would be easier to send to R&D for further investigation. 

    We also need to see how you queue the notification. Why you do request_count-- without checking the status of  submit_status = submit_request(&req); ? 


Reply
  • Hi Joakim, 

    I agree that there could be something wrong here especially when the test run for quite some time with 4000s notifications until the issue occurred. 

    Could you please try to reproduce the issue with a simple application ? Maybe a strip down version of your application or a modification of our example ? 

    If I can reproduce the issue here it would be easier to send to R&D for further investigation. 

    We also need to see how you queue the notification. Why you do request_count-- without checking the status of  submit_status = submit_request(&req); ? 


Children
No Data
Related