Hi,
I'm working on an application where I want to read a couple of characteristics to get information about the connected device before continuing doing useful stuff. I had a working implementation with ncs v2.5.0, which looked something like this (pss is our custom service, based on implementations found in ncs):
static void discovery_complete(struct bt_gatt_dm *dm, void *context)
{
LOG_INF("Discovery completed");
pss_client_handles_assign(dm, &sensor->pss_client);
bt_gatt_dm_data_release(dm);
k_work_submit(&sensor->deferred_work);
}
static void deferred_work_handler(struct k_work *work)
{
struct pss_client *pss_client = &sensor->pss_client;
// Read the UUID of the sensor
struct bt_gatt_read_params *params = &pss_client->uuid_params;
params->func = on_uuid_received;
params->handle_count = 1;
params->single.offset = 0;
params->single.handle = pss_client->handles.uuid_handle;
int err = bt_gatt_read(pss_client->conn, params);
if (err)
{
LOG_ERR("Failed to read UUID (err %d)", err);
}
err = k_sem_take(&sem_rsp, K_MSEC(3000));
if (err == -EAGAIN)
{
LOG_ERR("Failed to read UUID of the sensor (err %d)", err);
return;
}
// Start doing something useful
}
static void on_uuid_received(struct pss_client *pss_client, uint64_t uuid)
{
// Simplified
LOG_INF("Sensor UUID: %016llx", uuid)
k_sem_give(&sem_rsp);
}
However, after updating our project to ncs v3.2.2, this seems to break. I still get the "Discovery completed" log, but then I see error messages indicating that the uuid read failed due to the semaphore timing out. Interestingly, I do still see the UUID being printed immediately after that. So it seems that there is some kind of lock-up happening due to my waiting on the semaphore in the workqueue. I was under the impression that the bt_gatt_read_func_t callback function would not be called in the context of the systemworkq, and would therefore not be blocked by waiting in the workq, but it seems that that is the case now as I only get the uuid printed after the semaphore expired. I noticed in my .config file that CONFIG_SYSTEM_WORKQUEUE_PRIORITY is now part of the "Bluetooth Services", whereas for my build with ncs v2.5.0 this was part of the "System Work Queue Options". So it seems something changed to the way how the bluetooth services use the system workq.
I did some further digging, and I found I get things to work again by setting up a dedicated workq with a priority below that of the systemworkq (I took a priority of 3 for testing), which seems to reinforce that the issue is due to a lock-up in the system workq. Weirdly, I also tried doing the deferred_work directly in the discovery_complete function (instead of submitting work) after I learned that the BT_GATT_DM has its own workq since ncs v2.8.0, but that didn't work.
While I did find a workaround, I'm also keen on understanding what changed exactly in the way characteristics are read, to break my original implementation. That way, I can anticipate if I need to change other parts of my code as well.
Best,
Wout