Debuggen of BLE Applications based on Connect SDK 2.2.0 not possible -> Faulting instruction address (r15/pc)

I have the problem that I am currently unable to debug any of my Bluetooth applications.
I'm sure that I was able to do this without any problems during the original development with SDK v1.9.1.

The firmware works fine without a debugger.
However, when used with debugger, the firmware crashes shortly after bt_le_adv_start with the following output:

ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c:473
lll_preempt_calc: Actual EVENT_OVERHEAD_START_US = 2021270
[00:00:04.303,894] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000002 r2/a3: 0x00000001
[00:00:04.303,894] <err> os: r3/a4: 0x00000000 r12/ip: 0x0000d796 r14/lr: 0x00000fb9
[00:00:04.303,924] <err> os: xpsr: 0x41000028
[00:00:04.303,924] <err> os: Faulting instruction address (r15/pc): 0x00000fc4
[00:00:04.303,955] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:04.303,985] <err> os: Fault during interrupt handling[00:00:04.304,016] <err> os: Current thread: 0x20002150 (main)
[00:00:04.556,945] <err> fatal_error: Resetting system

After various investigations in my own projects, I tested older (v2.0.0) and newer (v2.2.99-dev3) SDK's. Unfortunately without success!

VSCode was also uninstalled (+ .vscode deleted in the user folder) and completely reinstalled. Also without success!

I then tried to debug the sample samples/bluetooth/peripheral (built with Connect SDK v2.2.0).
But here the firmware also crashes with the following output:

[00:00:03.346,[00:00:03.346,435] <inf> fs_nvs: nvs_mount: alloc wra: 0, fb8
[00:00:03.346,435] <inf> fs_nvs: nvs_mount: data wra: 0, 4c
[00:00:03.346,588] <inf> sdc_hci_driver: hci_driver_open: SoftDevice Controller build revision:
6d 90 41 2a 38 e8 ad 17 29 a5 03 38 39 27 d7 85 |m.A*8... )..89'..
1f 85 d8 e1 |....
[00:00:03.349,853] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:03.349,884] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:03.349,914] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 109.16784 Build 2917677098
[00:00:03.350,402] <inf> bt_hci_core: bt_init: No ID address. App must call settings_load()
[00:00:03.353,149] <inf> bt_hci_core: bt_dev_show_info: Identity: E7:BF:32:EE:30:C4 (random)
[00:00:03.353,179] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x11fa, manufacturer 0x0059
[00:00:03.353,210] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0x11fa
[00:00:12.012,115] <err> mpsl_init: m_assert_handler: MPSL ASSERT: 112, 2195
[00:00:12.012,115] <err> os: hard_fault: HARD FAULT
[00:00:12.012,145] <err> os: hard_fault: Fault escalation (see below)
[00:00:12.012,145] <err> os: hard_fault: ARCH_EXCEPT with reason 3
[00:00:12.012,176] <err> os: esf_dump: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000018
[00:00:12.012,176] <err> os: esf_dump: r3/a4: 0x0002d80f r12/ip: 0x00000000 r14/lr: 0x0002aad9
[00:00:12.012,207] <err> os: esf_dump: xpsr: 0x61000018
[00:00:12.012,207] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00029088[0m
[00:00:12.012,237] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:12.012,268] <err> os: z_fatal_error: Fault during interrupt handling

The crash occurs in bt_ready at line:

printk("Advertising successfully started\n");

If the line with bt_le_adv_start is commented out, you can debug without any problems!

Information about the environment:
  • Windows 10 22H2
  • nRF Connect SDK: v2.2.0 (v2.0.0, v2.2.99-dev3)
  • Visual Studio Code: Version: 1.75.0 (system setup)
  • nRF Connect for VS Code: v2023.1.44
  • Board: nRF52 DK NRF52832
  • Hi Marko,

    Could I ask you to try switching to the Zephyr BLE controller? This can be enabled through CONFIG_BT_LL_SW_SPLIT.

    I'm curious if it works. Meanwhile, the original issue might be the result of a bug related to stack space, I'm not certain yet.

    If you are able to share your project with us, that would be helpful.

    Best regards,

    Raoul

  • Hi Raoul,
    We are already using CONFIG_BT_LL_SW_SPLIT in the current projects because we had problems with BLE + UART that could be solved with it.

    For this troubleshooting I would only refer to the example project samples/bluetooth/peripheral (with nRF52 DK NRF52832).

    I added a breakpoint to the line

    error = bt_enable(NULL);

    and

    base_notify();

    I only use the first breakpoint (bt_enable) to determine a restart (only stops at main() the first time)
    I want to debug the second breakpoint (bas_notify).

    Result of the different (Pristine) builds:
    NCS 2.1.3 without CONFIG_BT_LL_SW_SPLIT: crash while debugging
    NCS 2.1.3 with CONFIG_BT_LL_SW_SPLIT=y: Debugging works fine

    NCS 2.2.0 without CONFIG_BT_LL_SW_SPLIT: crash while debugging
    NCS 2.2.0 with CONFIG_BT_LL_SW_SPLIT: crash while debugging

    I hope I can help with this information to fix the error.

    Best regards
    Mark

  • Hi Mark, thanks a lot for sharing these details.

    I now realise that we have received multiple cases recently related to MPSL ASSERT: 112, 2195 being triggered by a breakpoint. I've informed the developers and shared some of your findings.

    I'll let you know when they find out what to do about it.

    Best regards,

    Raoul

  • Hi again,

    I realised that the MPSL assert was only related to the case where you enabled the Nordic SoftDevice controller, but that wasn't your original issue. Sorry for the confusion.

    Your original issue was related to the Zephyr BLE controller - you were able to set a breakpoint at the start of your BLE enabled app before, but now no longer. Regarding that, I've heard back from a developer:

    Zephyr Controller has recently added strict assertion on delayed events. If he strictly for debugging, he can comment out the said assertion:

    https://github.com/zephyrproject-rtos/zephyr/commit/ebf723626704aaffa95a2a22fb4415a4ae59bf00

    The Nordic SD Controller has never allowed breakpoints while it is enabled. Now the Zephyr controller doesn't either. So please note that this newly added assert should be seen more as a bugfix than anything else; timing is critical in BLE.

    Best regards,

    Raoul

  • Hi Raoul,
    does this mean that with newer SDK's it will never be possible to debug an application using BLE?

    This would mean the end of our projects related to Zephyr and Nordic MCU's!
    It is impossible to develop complex BLE projects without a debugger or to localize reported errors.

    You can't always remove BLE from the projects to be able to debug. This would be a high risk when creating a release candidate, as you might not re-enable what is commented out correctly.
    And I see another problem when you need the BLE communication to start your own processes that you want to debug.
    This is exactly what I need right now and it works with NCS 2.1.3 as long as I don't set breakpoints in the BLE communication but in my own threads.
    With NCS 2.1.3, even breakpoints when receiving BLE packets work, but only once, since the connection then breaks down (this is acceptable). But that's enough to be able to debug your own processing at least once.

    It is also interesting that the breakpoint in the sample project samples/bluetooth/peripheral already has problems, only when advertising is already activated (without a connection).

    Here the developers should create a way to also be able to debug with BLE. Either via CONFIG_DEBUG=y or a new explicit configuration parameter.

    It would be nice if the developers could find a way.
    Thanks :-)

    Best regards,
    Marko

Related