When calling bt_le_adv_update_data random assert in bt_hci_cmd_send_sync

I have code that updates the advertising data once per second using bt_le_adv_update_data. This works fine but every 30 minutes or so I get an assert (see below) in bt_hci_cmd_send_sync waiting to take a k_sem. I can only assume the second core with the Bluetooth subsystem failed because the k_sem_take has a 10 second timeout which seems to be plenty of time to handle the hci command. I'm using nRF Connect v2.3.0. I'm new to multi-core embedded apps. Is there a way to monitor the second cores behavior, or debug it separately? I believe I am using the provided SoftDevice controller on the net core which is handling the bluetooth communications.

int bt_hci_cmd_send_sync(uint16_t opcode, struct net_buf *buf,
             struct net_buf **rsp)
{
    struct k_sem sync_sem;
    uint8_t status;
    int err;

    if (!buf) {
        buf = bt_hci_cmd_create(opcode, 0);
        if (!buf) {
            return -ENOBUFS;
        }
    }

    LOG_DBG("buf %p opcode 0x%04x len %u", buf, opcode, buf->len);

    k_sem_init(&sync_sem, 0, 1);
    cmd(buf)->sync = &sync_sem;

    net_buf_put(&bt_dev.cmd_tx_queue, net_buf_ref(buf));

    err = k_sem_take(&sync_sem, HCI_CMD_TIMEOUT);  // <<<<<<<<<<<<< this times out
    BT_ASSERT_MSG(err == 0, "k_sem_take failed with err %d", err);

    status = cmd(buf)->status;
    if (status) {
        LOG_WRN("opcode 0x%04x status 0x%02x", opcode, status);
        net_buf_unref(buf);

        switch (status) {
        case BT_HCI_ERR_CONN_LIMIT_EXCEEDED:
            return -ECONNREFUSED;
        default:
            return -EIO;
        }
    }

    LOG_DBG("rsp %p opcode 0x%04x len %u", buf, opcode, buf->len);

    if (rsp) {
        *rsp = buf;
    } else {
        net_buf_unref(buf);
    }

    return 0;
}

Parents Reply Children
  • Thanks, It doesn't look like the same assert location but I'll look more closely at this issue. It seems to happen every 33 minutes no matter what we do to make it break. We've changed heap sizes, advertising update times, advertising interval times. Still 33 minutes. 

  • I found the problem.

    We had started with the nrf5340 dev kit board files as the template for our board files. We do not have an External crystal oscillator but use one of the Internal clocks. We had started out using the Internal calibrated RC oscillator but had switched to the Internal synthesized clock because the Internal calibrated RC oscillator conflicts with port0, pin0 and we need that. Because we used the dev kit board files this is specified in two places: one place for the app core and one place for the net core. We should be sharing those configs but instead ended up only changing it in once place. Hence, our project files had the below differences and it causes the crash described above. I was able to reproduce this bug on the dev kit as well.

    pcb_102_nrf5340_cpuapp_defconfig

    # Select the low frequency internal clock source
    CONFIG_NRFX_CLOCK_LFXO_TWO_STAGE_ENABLED=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_XTAL=n
    CONFIG_SOC_ENABLE_LFXO=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_SYNTH=y

    and our net core looked like this:

    pcb_102_nrf5340_cpunet_defconfig

    # Select the low frequency internal clock source

    CONFIG_NRFX_CLOCK_LFXO_TWO_STAGE_ENABLED=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC_CALIBRATION=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_50PPM=y

    Rob

Related