Debuggen of BLE Applications based on Connect SDK 2.2.0 not possible -> Faulting instruction address (r15/pc)

I have the problem that I am currently unable to debug any of my Bluetooth applications.
I'm sure that I was able to do this without any problems during the original development with SDK v1.9.1.

The firmware works fine without a debugger.
However, when used with debugger, the firmware crashes shortly after bt_le_adv_start with the following output:

ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c:473
lll_preempt_calc: Actual EVENT_OVERHEAD_START_US = 2021270
[00:00:04.303,894] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000002 r2/a3: 0x00000001
[00:00:04.303,894] <err> os: r3/a4: 0x00000000 r12/ip: 0x0000d796 r14/lr: 0x00000fb9
[00:00:04.303,924] <err> os: xpsr: 0x41000028
[00:00:04.303,924] <err> os: Faulting instruction address (r15/pc): 0x00000fc4
[00:00:04.303,955] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:04.303,985] <err> os: Fault during interrupt handling[00:00:04.304,016] <err> os: Current thread: 0x20002150 (main)
[00:00:04.556,945] <err> fatal_error: Resetting system

After various investigations in my own projects, I tested older (v2.0.0) and newer (v2.2.99-dev3) SDK's. Unfortunately without success!

VSCode was also uninstalled (+ .vscode deleted in the user folder) and completely reinstalled. Also without success!

I then tried to debug the sample samples/bluetooth/peripheral (built with Connect SDK v2.2.0).
But here the firmware also crashes with the following output:

[00:00:03.346,[00:00:03.346,435] <inf> fs_nvs: nvs_mount: alloc wra: 0, fb8
[00:00:03.346,435] <inf> fs_nvs: nvs_mount: data wra: 0, 4c
[00:00:03.346,588] <inf> sdc_hci_driver: hci_driver_open: SoftDevice Controller build revision:
6d 90 41 2a 38 e8 ad 17 29 a5 03 38 39 27 d7 85 |m.A*8... )..89'..
1f 85 d8 e1 |....
[00:00:03.349,853] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:03.349,884] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:03.349,914] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 109.16784 Build 2917677098
[00:00:03.350,402] <inf> bt_hci_core: bt_init: No ID address. App must call settings_load()
[00:00:03.353,149] <inf> bt_hci_core: bt_dev_show_info: Identity: E7:BF:32:EE:30:C4 (random)
[00:00:03.353,179] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x11fa, manufacturer 0x0059
[00:00:03.353,210] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0x11fa
[00:00:12.012,115] <err> mpsl_init: m_assert_handler: MPSL ASSERT: 112, 2195
[00:00:12.012,115] <err> os: hard_fault: HARD FAULT
[00:00:12.012,145] <err> os: hard_fault: Fault escalation (see below)
[00:00:12.012,145] <err> os: hard_fault: ARCH_EXCEPT with reason 3
[00:00:12.012,176] <err> os: esf_dump: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000018
[00:00:12.012,176] <err> os: esf_dump: r3/a4: 0x0002d80f r12/ip: 0x00000000 r14/lr: 0x0002aad9
[00:00:12.012,207] <err> os: esf_dump: xpsr: 0x61000018
[00:00:12.012,207] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00029088[0m
[00:00:12.012,237] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:12.012,268] <err> os: z_fatal_error: Fault during interrupt handling

The crash occurs in bt_ready at line:

printk("Advertising successfully started\n");

If the line with bt_le_adv_start is commented out, you can debug without any problems!

Information about the environment:
  • Windows 10 22H2
  • nRF Connect SDK: v2.2.0 (v2.0.0, v2.2.99-dev3)
  • Visual Studio Code: Version: 1.75.0 (system setup)
  • nRF Connect for VS Code: v2023.1.44
  • Board: nRF52 DK NRF52832
Parents
  • Hi Sven,

    Are you setting a breakpoint somewhere? Unfortunately it is simply not possible to do halting debugging of BLE applications. The Bluetooth controller has strict timing requirements and will hardfault when these can't be met. See this post for a more detailed explanation:  RE: Debugging while BT is working 

    If you need to debug your application, it's best to disable BLE temporarily. If you really have to debug with BLE, there is another debugging mode called Monitor Mode Debugging: https://www.segger.com/products/debug-probes/j-link/technology/monitor-mode-debugging/

    It kind of keeps the CPU running in a loop at the breakpoint, allowing other timing critical things to continue running in the background.

    I'm not familiar with Monitor Mode Debugging, and from what I can see, support for monitor mode debugging is still limited in NCS:  Problem using monitor mode debugging with the nrf connect sdk Debugging with Monitor Mode on NCS

    So I hope that you are able to debug your application with BLE temporarily disabled.

    Best regards,

    Raoul

  • Hi, I'm working with Sven and can add one more thing to the problem.

    The crash occurs as soon as advertisements are launched when there is no connection at all.

    I have earlier observed that a breakpoint in the processing of Bluetooth data (when connected) results in a disconnection at remote participant. However, I think that earlier (must have been NCS v1.9.x) I was able to debug at least one received packet to the end (without crashing).

    Turning off Bluetooth certainly helps when debugging application logic that has nothing to do with Bluetooth. However, our current development is receiving and sending data via Bluetooth.

    I've earlier had a crash (Instruction Address Error (r15/pc)) related to binding a UART after the advertising started. At that time I had found a hint on the web to use CONFIG_BT_LL_SW_SPLIT=y which fixed the crash (with "warning: Experimental symbol BT_LL_SW_SPLIT is enabled."). We have already tested this setting here, unfortunately without success.

    Regarding monitor mode debugging, I haven't been able to find anything for NCS/Zephyr so far.
    According to the Nordic recommendation: "SEGGER Embedded Studio Nordic Edition is no longer tested and recommended for new projects." (Release Notes 2.0.0) we have used Visual Studio Code for all new projects and those still under development.

    The question now is whether that was a good decision and whether NCS is perhaps not yet suitable for productive use at all?

    Thanks for the support!

    Marko

  • Hi Marko,

    I just wanted to say that I've shared your feedback with our NCS team. I know that BLE can be tightly integrated with business logic and so being able to set breakpoints is a huge benefit.

    The strict timing requirements for BLE operation means that ordinary halting can never be expected to work. But I assume you are asking for something like the Monitor Mode debugging I mentioned earlier. I've asked the team about support for this, and if I hear anything useful I'll share it with you.

    Meanwhile, for your very specific use case (continuing to do what you were able to do in v2.1.3 and v1.9.1), I want to reiterate that the only way this is possible at the moment is to use the Zephyr BLE controller and then commenting out the newly added asserts that I linked to in a previous reply.

    However, please note that the Nordic SoftDevice controller is the only BLE stack that we officially support, and the only stack that gets QDID's which helps with final certification. So it might be better to find a way around the issue for the moment, until hopefully Monitor Mode debugging support arrives.

    Best regards,

    Raoul

  • Hello Raoul,
    I'll get back on this topic.

    In the last NCS versions 2.4.0 to 2.4.2 I change the following code in the lll.c file:

    #if !IS_ENABLED(CONFIG_DEBUG)
        LL_ASSERT_MSG(false, "%s: Actual EVENT_OVERHEAD_START_US = %u", __func__,
            HAL_TICKER_TICKS_TO_US(diff));
    #endif // CONFIG_DEBUG

    This has helped me well so far, but it is consuming to do this with every new NCS version. Especially not just on my computer but also with work colleagues.

    Maybe could create an own config define to disable the strict timing requirements during development?

  • Hi Marko,

    I'm glad that you're able to debug your BLE application sufficiently with this "one extra step" into the connection!

    If you want to get an option included for this, I think the best way to proceed is to raise an issue or make a pull request on GitHub, and explain your reasoning. However, I don't think the option you suggest is likely to be added.

    I think your case is a rare one, and that most people will be expecting to debug further than this. But the assert was added for a reason - halting a BLE enabled application won't work without the connection breaking down.

    Besides this, we primarily develop with the SoftDevice Controller in mind, and the assert is firmly in place there.

    I haven't tried this out myself yet, but please check out this documentation on the "monitor mode" debugging that I mentioned earlier:

    https://docs.zephyrproject.org/latest/services/debugging/debugmon.html#cortex-m-debug-monitor

    If you need to debug further than you're currently doing, look into this. And if it ends up working for you, I recommend you to move over to the SoftDevice Controller.

    In the future I hope we can offer monitor mode debugging in some convenient way.

    Best regards,

    Raoul

  • A note about SDK 2.5.2:
    The patch I mentioned no longer works here!
    The call to LL_ASSERT_MSG was removed in GIT on 2023-02-13 and is now not included in SDK 2.5.2 (possibly also in earlier SDK versions).

    Interestingly, on 2023-06-28 the define LL_ASSERT_OVERHEAD was added in zephyr\subsys\bluetooth\controller\hal\debug.h. Here you can prevent LL_ASSERT_MSG from being triggered via "!CONFIG_BT_CTLR_ASSERT_OVERHEAD_START".

    See also https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/kconfig/index.html#CONFIG_BT_CTLR_ASSERT_OVERHEAD_START

    The background seems a bit different, but it could also be a solution to my debug problem.
    I still have to check whether this is the "built-in" solution for my problem.

  • Note on SDK 2.6.1:

    No patch works here anymore!
    I get the following error when compiling the project:

    gen_isr_tables.py: error: multiple registrations at table_index 24 for irq 24 (0x18)
    Existing handler 0x1a001, new handler 0x28e2f
    Has IRQ_CONNECT or IRQ_DIRECT_CONNECT accidentally been invoked on the same irq multiple times?

    The reason is the use of CONFIG_BT_LL_SW_SPLIT!

    As soon as I remove CONFIG_BT_LL_SW_SPLIT and use the Nordic stack, I can compile successfully again.

    Unfortunately, debugging then no longer works!

Reply
  • Note on SDK 2.6.1:

    No patch works here anymore!
    I get the following error when compiling the project:

    gen_isr_tables.py: error: multiple registrations at table_index 24 for irq 24 (0x18)
    Existing handler 0x1a001, new handler 0x28e2f
    Has IRQ_CONNECT or IRQ_DIRECT_CONNECT accidentally been invoked on the same irq multiple times?

    The reason is the use of CONFIG_BT_LL_SW_SPLIT!

    As soon as I remove CONFIG_BT_LL_SW_SPLIT and use the Nordic stack, I can compile successfully again.

    Unfortunately, debugging then no longer works!

Children
  • I have now found a useful guide for monitor mode.
    It allows me to debug my BLE project with SDK 2.6.1.

    Here I found a guide that worked for me:

    Section "7.1 Add the following Kconfig flags to prj.conf"

    Add:

    and section "7.4 Run the debug action"

    Command in the debug console of nRF Connect for VS Code:

Related