SoftDevice with Zephyr : Scanning extended advertisement packets crash the application

Context

We are working on a project that implies advertising and scanning extended advertised packets.

details :

  • NRF52840
  • Zephyr SDK 17.0
  • Zephyr 4.0.99
  • NRF SDK v3.0.0-preview1

Issue description

Our application freeze after a few seconds of scanning. The logs show that the bt hci driver generates a large number of BT_HCI_EVT_LE_EXT_ADVERTISING_REPORT (0x0d) events (the number of events before the crash is not constant) and then we get an error, either mpsl_init: MPSL ASSERT: 112, 1984, or bt_sdc_hci_driver: SoftDevice Controller ASSERT: 50. Both cases it leads to an OS Hard fault. We are working in an office with a lot of devices advertising, so we tried to put the device in a Faraday box and observed that the bug happens immediately when we open the box and not before.

Theory

We think this might be a configuration issue and we spent some time trying to play with different buffer sizes but no relevant results.

More contexts:

BLE related configuration :

# Bluetooth Configuration
CONFIG_BT=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_CTLR=n
CONFIG_BT_EXT_ADV=y
CONFIG_BT_EXT_ADV_MAX_ADV_SET=2
CONFIG_BT_BROADCASTER=y
CONFIG_BT_CTLR=y
CONFIG_BT_CTLR_ADV_EXT=y
CONFIG_BT_HCI=y
CONFIG_BT_LL_SOFTDEVICE=y
CONFIG_BT_CTLR_ADV_DATA_CHAIN=n
CONFIG_BT_CTLR_ADV_DATA_LEN_MAX=1500
CONFIG_BT_OBSERVER=y

Scan enable function (return 0, and bt_enable(NULL) has returned 0 before that)

static int ble_scan_start()
{
    struct bt_le_scan_param scan_param = {
        .type = BT_HCI_LE_SCAN_PASSIVE,
        .options = BT_LE_SCAN_OPT_NONE,
        .interval = 37 / 0.625,
        .window = 25 / 0.625,
    };

    int err = bt_le_scan_start(&scan_param, scan_cb);
    return r;
}

 

Our scan cb is empty, proving that the issue do directly come from the application but from on of the lower layer.

Captured logs:

Parents
  • Hi!

    I'm checking with our SoftDevice Controller team what the reason for these asserts could be.

  • Hi, 
    Thank you, but we may have found the solution : adding CONFIG_MPSL_USE_EXTERNAL_CLOCK_CONTROL=y in the prj.conf.

  • The feedback from the SoftDevice Controller team is that using CONFIG_MPSL_USE_EXTERNAL_CLOCK_CONTROL=y should really not affect this bug, but it is hinting towards that you maybe are managing clocks in a strange way.

    1) Could you provide a minimal sample which can reproduce the issue?

    For example, can you reproduce the bug with https://github.com/nrfconnect/sdk-zephyr/blob/main/samples/bluetooth/central/src/main.c with CONFIG_BT_EXT_ADV=y?

    2) Are you doing anything else than scanning? E.g. are you using RTC0, using the radio directly somehow (apart from through the Bluetooth API),  interacting with clocks or disabling interrupts for a long period of time?

    3) If you are doing something with clocks in your app, could you post what you are doing?

  • Hi Sigurd,

    Thanks for the feedback and sorry for the late reply from our side. We also want to understand this issue better as setting the CONFIG_MPSL_USE_EXTERNAL_CLOCK_CONTROL=y seems to be a bit of magical solution to this problem as we're afraid that something on our side is not correctly configured. I therefore now tried as you suggested:

    1) Flashed the Bluetooth central example that you suggested an enabled CONFIG_BT_EXT_ADV=y together with BLE logging over RTT. This example works, in other words I don't see that we are crashing. And for reference, using our application with CONFIG_MPSL_USE_EXTERNAL_CLOCK_CONTROL=n makes our application crash in the same fashion as described above.

    2) We are not using the radio directly, only through the Bluetooth API. Neither are we RTC0 for something (it's only set to okay in our DTS), nor are we knowingly interacting with clocks or disabling interrupts for long period of time.

    3) We are not doing anything with the clocks

    Do you have any more suggestions for us to try?

    Best regards,
    Erlend

Reply
  • Hi Sigurd,

    Thanks for the feedback and sorry for the late reply from our side. We also want to understand this issue better as setting the CONFIG_MPSL_USE_EXTERNAL_CLOCK_CONTROL=y seems to be a bit of magical solution to this problem as we're afraid that something on our side is not correctly configured. I therefore now tried as you suggested:

    1) Flashed the Bluetooth central example that you suggested an enabled CONFIG_BT_EXT_ADV=y together with BLE logging over RTT. This example works, in other words I don't see that we are crashing. And for reference, using our application with CONFIG_MPSL_USE_EXTERNAL_CLOCK_CONTROL=n makes our application crash in the same fashion as described above.

    2) We are not using the radio directly, only through the Bluetooth API. Neither are we RTC0 for something (it's only set to okay in our DTS), nor are we knowingly interacting with clocks or disabling interrupts for long period of time.

    3) We are not doing anything with the clocks

    Do you have any more suggestions for us to try?

    Best regards,
    Erlend

Children
No Data
Related