NCS SDK 2.8.0 bt_sdc_hci_driver: SoftDevice Controller ASSERT

Hi,

We've been developing a product using nRF52840, where it does a lot of advertising and scanning.

We've migrated from SDK v2.5.2 to v2.8.0, and recently discovered we have a fairly rare intermittent fault. We've captured some information about the fault and it is consistently happening in `(ncs/v2.8.0)/nrf/subsys/bluetooth/controller/hci_driver.c`, where the sdc assertion handler is called. The ASSERT messages we've caught are:

- SoftDevice Controller ASSERT: 33, 421
SoftDevice Controller ASSERT: 33, 923
- SoftDevice Controller ASSERT: 48, 203

 

Any advice at all for how we can narrow down the cause of this issue and stop it from happening? Our product use case can't trivially tolerate these unexpected fault + reset events, and the assert appears to be happening within the closed source SoftDevice part of the SDK.

Things that might help:

- known issues that cause this with known workaround

- ways to get extra information in the logs

Parents
  • Hello,

    These asserts may happen if SDC/MPSL is delayed by other activities. Can you check that you are not interrupting or delaying the high priority handlers?
    https://docs.nordicsemi.com/bundle/ncs-latest/page/nrfxlib/mpsl/doc/mpsl.html#high_priority 

    Kenneth

  • Hello,

    I'm looking into whether we might be somehow delaying or blocking the SDC from doing what it needs to do.

    Do know what sort of things I should look for?

    I don't think we are using high priority interrupts, but I could be mistaken.

    My thinking at this stage is that we might be doing too much work in an interrupt context somewhere, or blocking interrupts from occurring. I haven't found any noteworthy cases of using irq_lock() / irq_unlock(). We have many areas making use of k_mutex_lock() / k_mutex_unlock(). I'm not sure whether those can impact the SDC.

    We are using multithreading, but only single core (on nRF52840).
    We are using Zephyr for the RTOS.
    Our custom thread priorities are generally 6, 7, or 8 (positive value, so low priority)
    We haven't changed the default rules for thread pre-emption etc..
    We are using interrupt based drivers for two UART interfaces.

    We have recently migrated to nRF Connect SDK v3.0.2 and seen the SDC ASSERT still occur.

  • As far as I can tell:

    • MPSL work thread handles the SDC activities, and in our nRF52840 project the MPSL work thread is being given the cooperative priority 6.
    • Our modem driver (based on the ones in zephyr/drivers/modem) is using cooperative priority 7 for the work queue and for the rx processing thread.

    That seems important - my understanding is that the work queue thread and the rx thread of our modem driver may block the MPSL thread from running as promptly as it needs to. Since they are both at 'cooperative' priority, neither would preempt the other. The actual numerical priority values are -(16-x), so -(16-7)=-9 and -(16-6)=-10, so the MPSL coop priority 6 is still higher priority, the only issue is the blocking.

    Should we try making our modem driver run with preemptible priority?

    Do you have any advice about the maximum time we can delay the MPSL thread without risking these ASSERT based crashes? Is it on the order of 1ms, 10ms, 50ms, 100ms?

Reply
  • As far as I can tell:

    • MPSL work thread handles the SDC activities, and in our nRF52840 project the MPSL work thread is being given the cooperative priority 6.
    • Our modem driver (based on the ones in zephyr/drivers/modem) is using cooperative priority 7 for the work queue and for the rx processing thread.

    That seems important - my understanding is that the work queue thread and the rx thread of our modem driver may block the MPSL thread from running as promptly as it needs to. Since they are both at 'cooperative' priority, neither would preempt the other. The actual numerical priority values are -(16-x), so -(16-7)=-9 and -(16-6)=-10, so the MPSL coop priority 6 is still higher priority, the only issue is the blocking.

    Should we try making our modem driver run with preemptible priority?

    Do you have any advice about the maximum time we can delay the MPSL thread without risking these ASSERT based crashes? Is it on the order of 1ms, 10ms, 50ms, 100ms?

Children
Related