When calling bt_le_adv_update_data random assert in bt_hci_cmd_send_sync

I have code that updates the advertising data once per second using bt_le_adv_update_data. This works fine but every 30 minutes or so I get an assert (see below) in bt_hci_cmd_send_sync waiting to take a k_sem. I can only assume the second core with the Bluetooth subsystem failed because the k_sem_take has a 10 second timeout which seems to be plenty of time to handle the hci command. I'm using nRF Connect v2.3.0. I'm new to multi-core embedded apps. Is there a way to monitor the second cores behavior, or debug it separately? I believe I am using the provided SoftDevice controller on the net core which is handling the bluetooth communications.

int bt_hci_cmd_send_sync(uint16_t opcode, struct net_buf *buf,
             struct net_buf **rsp)
{
    struct k_sem sync_sem;
    uint8_t status;
    int err;

    if (!buf) {
        buf = bt_hci_cmd_create(opcode, 0);
        if (!buf) {
            return -ENOBUFS;
        }
    }

    LOG_DBG("buf %p opcode 0x%04x len %u", buf, opcode, buf->len);

    k_sem_init(&sync_sem, 0, 1);
    cmd(buf)->sync = &sync_sem;

    net_buf_put(&bt_dev.cmd_tx_queue, net_buf_ref(buf));

    err = k_sem_take(&sync_sem, HCI_CMD_TIMEOUT);  // <<<<<<<<<<<<< this times out
    BT_ASSERT_MSG(err == 0, "k_sem_take failed with err %d", err);

    status = cmd(buf)->status;
    if (status) {
        LOG_WRN("opcode 0x%04x status 0x%02x", opcode, status);
        net_buf_unref(buf);

        switch (status) {
        case BT_HCI_ERR_CONN_LIMIT_EXCEEDED:
            return -ECONNREFUSED;
        default:
            return -EIO;
        }
    }

    LOG_DBG("rsp %p opcode 0x%04x len %u", buf, opcode, buf->len);

    if (rsp) {
        *rsp = buf;
    } else {
        net_buf_unref(buf);
    }

    return 0;
}

Parents Reply
  • I did not manage to reproduce the assert in our latest (v2.3.0) trying to update advertising data once a second. Maybe you have a same issue like the one mentioned here and the main thing that might concern you from that github issue comments is 

    bt_le_adv_update_data is called from timer ISR context, which is not supported, use delayed work instead

    Check if you are calling the update from the timer ISR and if so delay the execution into a lower priority work queue.

Children
  • Thanks, I've tried putting all my BT calls in the main thread and I still have the problem. I was never running in a IRS context though, just in a separate thread. I also now have a test app that runs on a dev kit but it does not have the problem so I must have an issue but I'm not sure what to try next to track it down. 

  • Then I am not able to reproduce it calling it in the thread context and not the ISR context. Can you please attach a simplistic project for me to reproduce and debug this at my desk?

  • Thanks, It doesn't look like the same assert location but I'll look more closely at this issue. It seems to happen every 33 minutes no matter what we do to make it break. We've changed heap sizes, advertising update times, advertising interval times. Still 33 minutes. 

  • I found the problem.

    We had started with the nrf5340 dev kit board files as the template for our board files. We do not have an External crystal oscillator but use one of the Internal clocks. We had started out using the Internal calibrated RC oscillator but had switched to the Internal synthesized clock because the Internal calibrated RC oscillator conflicts with port0, pin0 and we need that. Because we used the dev kit board files this is specified in two places: one place for the app core and one place for the net core. We should be sharing those configs but instead ended up only changing it in once place. Hence, our project files had the below differences and it causes the crash described above. I was able to reproduce this bug on the dev kit as well.

    pcb_102_nrf5340_cpuapp_defconfig

    # Select the low frequency internal clock source
    CONFIG_NRFX_CLOCK_LFXO_TWO_STAGE_ENABLED=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_XTAL=n
    CONFIG_SOC_ENABLE_LFXO=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_SYNTH=y

    and our net core looked like this:

    pcb_102_nrf5340_cpunet_defconfig

    # Select the low frequency internal clock source

    CONFIG_NRFX_CLOCK_LFXO_TWO_STAGE_ENABLED=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC_CALIBRATION=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_50PPM=y

    Rob

Related