Debuggen of BLE Applications based on Connect SDK 2.2.0 not possible -> Faulting instruction address (r15/pc)

I have the problem that I am currently unable to debug any of my Bluetooth applications.
I'm sure that I was able to do this without any problems during the original development with SDK v1.9.1.

The firmware works fine without a debugger.
However, when used with debugger, the firmware crashes shortly after bt_le_adv_start with the following output:

ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c:473
lll_preempt_calc: Actual EVENT_OVERHEAD_START_US = 2021270
[00:00:04.303,894] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000002 r2/a3: 0x00000001
[00:00:04.303,894] <err> os: r3/a4: 0x00000000 r12/ip: 0x0000d796 r14/lr: 0x00000fb9
[00:00:04.303,924] <err> os: xpsr: 0x41000028
[00:00:04.303,924] <err> os: Faulting instruction address (r15/pc): 0x00000fc4
[00:00:04.303,955] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:04.303,985] <err> os: Fault during interrupt handling[00:00:04.304,016] <err> os: Current thread: 0x20002150 (main)
[00:00:04.556,945] <err> fatal_error: Resetting system

After various investigations in my own projects, I tested older (v2.0.0) and newer (v2.2.99-dev3) SDK's. Unfortunately without success!

VSCode was also uninstalled (+ .vscode deleted in the user folder) and completely reinstalled. Also without success!

I then tried to debug the sample samples/bluetooth/peripheral (built with Connect SDK v2.2.0).
But here the firmware also crashes with the following output:

[00:00:03.346,[00:00:03.346,435] <inf> fs_nvs: nvs_mount: alloc wra: 0, fb8
[00:00:03.346,435] <inf> fs_nvs: nvs_mount: data wra: 0, 4c
[00:00:03.346,588] <inf> sdc_hci_driver: hci_driver_open: SoftDevice Controller build revision:
6d 90 41 2a 38 e8 ad 17 29 a5 03 38 39 27 d7 85 |m.A*8... )..89'..
1f 85 d8 e1 |....
[00:00:03.349,853] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:03.349,884] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:03.349,914] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 109.16784 Build 2917677098
[00:00:03.350,402] <inf> bt_hci_core: bt_init: No ID address. App must call settings_load()
[00:00:03.353,149] <inf> bt_hci_core: bt_dev_show_info: Identity: E7:BF:32:EE:30:C4 (random)
[00:00:03.353,179] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x11fa, manufacturer 0x0059
[00:00:03.353,210] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0x11fa
[00:00:12.012,115] <err> mpsl_init: m_assert_handler: MPSL ASSERT: 112, 2195
[00:00:12.012,115] <err> os: hard_fault: HARD FAULT
[00:00:12.012,145] <err> os: hard_fault: Fault escalation (see below)
[00:00:12.012,145] <err> os: hard_fault: ARCH_EXCEPT with reason 3
[00:00:12.012,176] <err> os: esf_dump: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000018
[00:00:12.012,176] <err> os: esf_dump: r3/a4: 0x0002d80f r12/ip: 0x00000000 r14/lr: 0x0002aad9
[00:00:12.012,207] <err> os: esf_dump: xpsr: 0x61000018
[00:00:12.012,207] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00029088[0m
[00:00:12.012,237] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:12.012,268] <err> os: z_fatal_error: Fault during interrupt handling

The crash occurs in bt_ready at line:

printk("Advertising successfully started\n");

If the line with bt_le_adv_start is commented out, you can debug without any problems!

Information about the environment:
  • Windows 10 22H2
  • nRF Connect SDK: v2.2.0 (v2.0.0, v2.2.99-dev3)
  • Visual Studio Code: Version: 1.75.0 (system setup)
  • nRF Connect for VS Code: v2023.1.44
  • Board: nRF52 DK NRF52832
  • Hi Marko,

    I just wanted to say that I've shared your feedback with our NCS team. I know that BLE can be tightly integrated with business logic and so being able to set breakpoints is a huge benefit.

    The strict timing requirements for BLE operation means that ordinary halting can never be expected to work. But I assume you are asking for something like the Monitor Mode debugging I mentioned earlier. I've asked the team about support for this, and if I hear anything useful I'll share it with you.

    Meanwhile, for your very specific use case (continuing to do what you were able to do in v2.1.3 and v1.9.1), I want to reiterate that the only way this is possible at the moment is to use the Zephyr BLE controller and then commenting out the newly added asserts that I linked to in a previous reply.

    However, please note that the Nordic SoftDevice controller is the only BLE stack that we officially support, and the only stack that gets QDID's which helps with final certification. So it might be better to find a way around the issue for the moment, until hopefully Monitor Mode debugging support arrives.

    Best regards,

    Raoul

  • Hello Raoul,
    I'll get back on this topic.

    In the last NCS versions 2.4.0 to 2.4.2 I change the following code in the lll.c file:

    #if !IS_ENABLED(CONFIG_DEBUG)
        LL_ASSERT_MSG(false, "%s: Actual EVENT_OVERHEAD_START_US = %u", __func__,
            HAL_TICKER_TICKS_TO_US(diff));
    #endif // CONFIG_DEBUG

    This has helped me well so far, but it is consuming to do this with every new NCS version. Especially not just on my computer but also with work colleagues.

    Maybe could create an own config define to disable the strict timing requirements during development?

  • Hi Marko,

    I'm glad that you're able to debug your BLE application sufficiently with this "one extra step" into the connection!

    If you want to get an option included for this, I think the best way to proceed is to raise an issue or make a pull request on GitHub, and explain your reasoning. However, I don't think the option you suggest is likely to be added.

    I think your case is a rare one, and that most people will be expecting to debug further than this. But the assert was added for a reason - halting a BLE enabled application won't work without the connection breaking down.

    Besides this, we primarily develop with the SoftDevice Controller in mind, and the assert is firmly in place there.

    I haven't tried this out myself yet, but please check out this documentation on the "monitor mode" debugging that I mentioned earlier:

    https://docs.zephyrproject.org/latest/services/debugging/debugmon.html#cortex-m-debug-monitor

    If you need to debug further than you're currently doing, look into this. And if it ends up working for you, I recommend you to move over to the SoftDevice Controller.

    In the future I hope we can offer monitor mode debugging in some convenient way.

    Best regards,

    Raoul

  • A note about SDK 2.5.2:
    The patch I mentioned no longer works here!
    The call to LL_ASSERT_MSG was removed in GIT on 2023-02-13 and is now not included in SDK 2.5.2 (possibly also in earlier SDK versions).

    Interestingly, on 2023-06-28 the define LL_ASSERT_OVERHEAD was added in zephyr\subsys\bluetooth\controller\hal\debug.h. Here you can prevent LL_ASSERT_MSG from being triggered via "!CONFIG_BT_CTLR_ASSERT_OVERHEAD_START".

    See also https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/kconfig/index.html#CONFIG_BT_CTLR_ASSERT_OVERHEAD_START

    The background seems a bit different, but it could also be a solution to my debug problem.
    I still have to check whether this is the "built-in" solution for my problem.

  • Note on SDK 2.6.1:

    No patch works here anymore!
    I get the following error when compiling the project:

    gen_isr_tables.py: error: multiple registrations at table_index 24 for irq 24 (0x18)
    Existing handler 0x1a001, new handler 0x28e2f
    Has IRQ_CONNECT or IRQ_DIRECT_CONNECT accidentally been invoked on the same irq multiple times?

    The reason is the use of CONFIG_BT_LL_SW_SPLIT!

    As soon as I remove CONFIG_BT_LL_SW_SPLIT and use the Nordic stack, I can compile successfully again.

    Unfortunately, debugging then no longer works!

Related