bt_conn_disconnect() - ASSERTION FAILED [err == 0] k_sem_take return err -11

Hi,

I just started developing a BLE Central device on a nRF52840 DK.
After successfully connect my DevKit with a phone acting as a peripheral (using nRF Connect Mobile), The DK was able to subscribe to some services I configured on the phone and receive the corresponding notifications.

Right now I'm trying to disconnect every BLE connections the DK has when a specific button is pressed.
I managed to init the button and to add a callback function on it. The function works great but when I'm trying to disconnect the peripheral, I got an error:

ASSERTION FAIL [err == 0] @ ZEPHYR_BASE/subsys/bluetooth/host/hci_core.c:306
k_sem_take failed with err -11

This is the part of my code that should disconnect every BLE connection that the central has:

static void disconnect(struct bt_conn *conn, void *data)
{
 char addr[BT_ADDR_LE_STR_LEN];
 
 bt_addr_le_to_str(bt_conn_get_dst(conn), addr, sizeof(addr));
 printk("Disconnecting %s...\n", addr);

 int err = bt_conn_disconnect(conn, BT_HCI_ERR_REMOTE_USER_TERM_CONN);
 if (err)
 {
   printk("Failed disconnection %s\n", addr);
 }
 else
 {
   printk("Success\n");
 }
}

void ble_disconnect_all(void)
{ 
 bt_conn_foreach(BT_CONN_TYPE_LE, disconnect, NULL);
}

The function ble_disconnect_all(void) is called when the button1 is pressed, the printk("Disconnecting %s...\n", addr) is actually printed but when the program called bt_conn_disconnect(conn, BT_HCI_REMOTE_USER_TERM_CONN), then I get the error above.

I look up to the hci_core.c at the line 306. It's in the function called bt_hci_cmd_send_sync(). In this function, k_sem_take(&sync_sem, HCI_CMD_TIMEOUT) tried to take a semaphore, failed and then the BT_ASSERT_MSG() triggered. The message appears immediately when I pressed the button, it doesn't wait the 10 seconds that is configure by default in HCI_CMD_TIMEOUT.

Since this error appears, the program seems to be frozen. Nothing happen when I manually disconnect my phone (either by disabling the bluetooth or just by pressing the Disconnect option in nRF Connect Mobile). The only way I found to unfroze the device is by turning it off/on.

My setup

  • Ubuntu 22.04 in WSL2 (Host OS is Windows 10)
  • PlatformIO CLI, version 6.1.15
  • Zephyr framework, version 2.20701.220422

Prj.conf:

CONFIG_BT=y
CONFIG_BT_DEVICE_NAME="BLE_test"
CONFIG_BT_CENTRAL=y
CONFIG_BT_GATT_CLIENT=y
CONFIG_CONSOLE=y
CONFIG_GPIO=y
Parents
  • Hi Victor,

    I suspect that the button was bouncing, causing the handler to be called multiple times.

    Could you please update the logic so that the button handler simply raises a flag, and the disconnection code happens elsewhere when that flag is raised?

    Hieu

  • Hi Hieu,

    I thought about this problem when I wrote the code. I made a protection with a timer to be sure that the handler isn't called multiple times. This is my callback and init functions concerning that button:

    #define BUTTON_TIMEOUT_MS 250
    
    static uint64_t last_press_time = 0;
    
    static const struct gpio_dt_spec button = GPIO_DT_SPEC_GET(DT_NODELABEL(button0), gpios);
    static struct gpio_callback button_cb_data;
    
    static void button_disconnect_cb(const struct device *dev, struct gpio_callback *cb, uint32_t pins)
    {
      uint64_t now = k_uptime_get();
      if ((now - last_press_time) > BUTTON_TIMEOUT_MS)
      {
        printk("Disconnect every devices...\n");
        ble_disconnect_all();
        last_press_time = now;
      }
      else
      {
        printk("Pls wait a bit\n");
      }
    }
    
    int button_init(void)
    {
      if (!device_is_ready(button.port))
      {
        printk("Error: button device %s not ready\n", button.port->name);
        return 1;
      }
      int ret = gpio_pin_configure_dt(&button, GPIO_INPUT);
      if (ret)
      {
        printk("Error: %d: failed to configure %s pin %d\n", ret, button.port->name, button.pin);
        return 1;
      }
      ret = gpio_pin_interrupt_configure_dt(&button, GPIO_INT_EDGE_TO_ACTIVE);
      if (ret)
      {
        printk("Error: %d: failed to configure interrupt on %s pin %d\n", ret, button.port->name, button.pin);
        return 1;
      }
    
      // Setup button callback
      gpio_init_callback(&button_cb_data, button_disconnect_cb, BIT(button.pin));
      gpio_add_callback(button.port, &button_cb_data);
      return 0;
    }

    Do you think that's a good way to resolve button bounce ? Is it enough ?

    I'm going to try with the flag logic.

    Thanks,
    Victor

  • Hi Victor,

    In your code, last_press_time could be volatile, which helps ensuring that the value is up to date. It could also be local to the callback, but that should not affect performance/behavior.

    printk() by default is in-place and blocking, so the printk() calls will prolong the execution of the callback. It might be a good idea to replace them with Logging APIs or CONFIG_LOG_PRINTK, both together with CONFIG_LOG_MODE_DEFERRED, to make printk() non-blocking.

    Otherwise, I think the logic is reasonable. I also discuss a bit further on debouncing here:  Zephyr debounce support 

    Please let me know if the flag handling improves the situation. If that works, you might also want to consider upgrading it to using workqueue or other Zephyr kernel services.

    Regards,

    Hieu

Reply Children
  • Hi Hieu,

    Thanks for the answer, I couldn't respond earlier due to vacation.

    As you advised, I replace every printk() call by the logging API (LOG_INF(), sometimes LOG_WRN(), etc...).

    In the button's callback function, I replace the ble_disconnect_all() call to set a flag as true and at the end of my main() function, I put a infinite loop that check the value of that flag and if it's true, it will call ble_disconnect_all() and then set the flag to false.

    It works great but now, the logging API's calls doesn't print anything. It seems that the while(1) block the logging API but I don't really understand why.
    When I put a break in my if statement inside the loop, every LOG_XXX() print is suddenly flush to the serial terminal only when the break triggered. Using the blocking printk() instead of the Logging API doesn't made this problem appears.

    int main(void)
    {
      LOG_INIT();
    
      // Init Bluetooth subsystem
      int err = bt_enable(NULL);
      if (err)
      {
        LOG_ERR("Bluetooth init failed (err %d)", err);
        return  1;
      }
      LOG_INF("Bluetooth initialized");
      flag_ble_disconnect = false;
    
      // Init Button
      err = button_init();
      if (err)
      {
        LOG_ERR("Button init failed (err %d)", err);
        return err;
      }
      
      start_scan();
    
      while (1)
      {
        if (flag_ble_disconnect)
        {
          ble_disconnect_all();
          flag_ble_disconnect = false;
          
          // Without this break, every LOG_XXX() seems to be ignored.
          // With this break, the LOG_XXX() prints ONLY after the break's trigger...
          break;
          // It is a problem because if this break is here, the disconnect function can't be called multiple time.
        }
      }
      return 0;
    }

    Can the workqueue resolve this problem ? If yes, why ?

    Isn't this flag logic come with some concurrence problems like data races ?

    And I have one more question. In the bt_parse_data callback, I check if the data type is BT_DATA_NAME_COMPLETE and if it's the case, I'm logging it. However the name that is printed contains some artifacts that aren't the same every time it is printed. Why is it printed like that and how can I resolve this print ?

    static bool parse_data_cb(struct bt_data *data, void *user_data)
    {
      bt_addr_le_t *addr = user_data;
    
      switch (data->type)
      {
        case BT_DATA_NAME_SHORTENED:
        case BT_DATA_NAME_COMPLETE:
          LOG_INF("Device name: %s", log_strdup(data->data));
          
          // Connecting to my phone. This condition will change in the future
          if (strncmp(data->data, "Zenfone 9", data->data_len) == 0)
          {
            int err = bt_le_scan_stop();
            if (err)
            {
              LOG_ERR("Stop LE scan failed (err %d)", err);
              return false;
            }
    
            struct bt_conn_le_create_param *create_param = BT_CONN_LE_CREATE_CONN;
            struct bt_le_conn_param *param = BT_LE_CONN_PARAM_DEFAULT;
            
            err = bt_conn_le_create(addr, create_param, param, &default_conn);
            if (err)
            {
              LOG_WRN("Connection failed (err %d)", err);
              start_scan();
            }
          }
          break;
        default:
          break;
      }
    
      return true;
    }

    The print:

    [00:00:00.462,860] <inf> main: Device name: Zenfone 9��(�<�
    ...
    [00:00:02.539,001] <inf> main: Device name: Zenfone 9��89����

    Anyway, it's just a visual problem in the terminal, the connection works perfectly. This issue appears with all devices that are advertising their name (there are about 10 different peripherals around me that are advertising their name).

    It is my first embedded application so I don't really have a good understanding of what is going under the hood and how it's working.

    Kind regards,
    Victor

  • Update:

    I just remove the while loop in my main function and use the system workqueue to call my disconnect function. It works exactly as I want and I don't have pending print anymore.

    [00:00:00.462,860] <inf> main: Device name: Zenfone 9��(�<�
    ...
    [00:00:02.539,001] <inf> main: Device name: Zenfone 9��89����

    However, I still have this artifacts. The only peripheral's device name that is print normally is a Sony WH-1000XM4. Around me, there are some Jabra headsets, a nRF52805, a WH-1000XM3 and some connected watches too that don't print well either.

Related