Debuggen of BLE Applications based on Connect SDK 2.2.0 not possible -> Faulting instruction address (r15/pc)

I have the problem that I am currently unable to debug any of my Bluetooth applications.
I'm sure that I was able to do this without any problems during the original development with SDK v1.9.1.

The firmware works fine without a debugger.
However, when used with debugger, the firmware crashes shortly after bt_le_adv_start with the following output:

ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c:473
lll_preempt_calc: Actual EVENT_OVERHEAD_START_US = 2021270
[00:00:04.303,894] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000002 r2/a3: 0x00000001
[00:00:04.303,894] <err> os: r3/a4: 0x00000000 r12/ip: 0x0000d796 r14/lr: 0x00000fb9
[00:00:04.303,924] <err> os: xpsr: 0x41000028
[00:00:04.303,924] <err> os: Faulting instruction address (r15/pc): 0x00000fc4
[00:00:04.303,955] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:04.303,985] <err> os: Fault during interrupt handling[00:00:04.304,016] <err> os: Current thread: 0x20002150 (main)
[00:00:04.556,945] <err> fatal_error: Resetting system

After various investigations in my own projects, I tested older (v2.0.0) and newer (v2.2.99-dev3) SDK's. Unfortunately without success!

VSCode was also uninstalled (+ .vscode deleted in the user folder) and completely reinstalled. Also without success!

I then tried to debug the sample samples/bluetooth/peripheral (built with Connect SDK v2.2.0).
But here the firmware also crashes with the following output:

[00:00:03.346,[00:00:03.346,435] <inf> fs_nvs: nvs_mount: alloc wra: 0, fb8
[00:00:03.346,435] <inf> fs_nvs: nvs_mount: data wra: 0, 4c
[00:00:03.346,588] <inf> sdc_hci_driver: hci_driver_open: SoftDevice Controller build revision:
6d 90 41 2a 38 e8 ad 17 29 a5 03 38 39 27 d7 85 |m.A*8... )..89'..
1f 85 d8 e1 |....
[00:00:03.349,853] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:03.349,884] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:03.349,914] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 109.16784 Build 2917677098
[00:00:03.350,402] <inf> bt_hci_core: bt_init: No ID address. App must call settings_load()
[00:00:03.353,149] <inf> bt_hci_core: bt_dev_show_info: Identity: E7:BF:32:EE:30:C4 (random)
[00:00:03.353,179] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x11fa, manufacturer 0x0059
[00:00:03.353,210] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0x11fa
[00:00:12.012,115] <err> mpsl_init: m_assert_handler: MPSL ASSERT: 112, 2195
[00:00:12.012,115] <err> os: hard_fault: HARD FAULT
[00:00:12.012,145] <err> os: hard_fault: Fault escalation (see below)
[00:00:12.012,145] <err> os: hard_fault: ARCH_EXCEPT with reason 3
[00:00:12.012,176] <err> os: esf_dump: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000018
[00:00:12.012,176] <err> os: esf_dump: r3/a4: 0x0002d80f r12/ip: 0x00000000 r14/lr: 0x0002aad9
[00:00:12.012,207] <err> os: esf_dump: xpsr: 0x61000018
[00:00:12.012,207] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00029088[0m
[00:00:12.012,237] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:12.012,268] <err> os: z_fatal_error: Fault during interrupt handling

The crash occurs in bt_ready at line:

printk("Advertising successfully started\n");

If the line with bt_le_adv_start is commented out, you can debug without any problems!

Information about the environment:
  • Windows 10 22H2
  • nRF Connect SDK: v2.2.0 (v2.0.0, v2.2.99-dev3)
  • Visual Studio Code: Version: 1.75.0 (system setup)
  • nRF Connect for VS Code: v2023.1.44
  • Board: nRF52 DK NRF52832
  • Hi Sven,

    Are you setting a breakpoint somewhere? Unfortunately it is simply not possible to do halting debugging of BLE applications. The Bluetooth controller has strict timing requirements and will hardfault when these can't be met. See this post for a more detailed explanation:  RE: Debugging while BT is working 

    If you need to debug your application, it's best to disable BLE temporarily. If you really have to debug with BLE, there is another debugging mode called Monitor Mode Debugging: https://www.segger.com/products/debug-probes/j-link/technology/monitor-mode-debugging/

    It kind of keeps the CPU running in a loop at the breakpoint, allowing other timing critical things to continue running in the background.

    I'm not familiar with Monitor Mode Debugging, and from what I can see, support for monitor mode debugging is still limited in NCS:  Problem using monitor mode debugging with the nrf connect sdk Debugging with Monitor Mode on NCS

    So I hope that you are able to debug your application with BLE temporarily disabled.

    Best regards,

    Raoul

  • Hi, I'm working with Sven and can add one more thing to the problem.

    The crash occurs as soon as advertisements are launched when there is no connection at all.

    I have earlier observed that a breakpoint in the processing of Bluetooth data (when connected) results in a disconnection at remote participant. However, I think that earlier (must have been NCS v1.9.x) I was able to debug at least one received packet to the end (without crashing).

    Turning off Bluetooth certainly helps when debugging application logic that has nothing to do with Bluetooth. However, our current development is receiving and sending data via Bluetooth.

    I've earlier had a crash (Instruction Address Error (r15/pc)) related to binding a UART after the advertising started. At that time I had found a hint on the web to use CONFIG_BT_LL_SW_SPLIT=y which fixed the crash (with "warning: Experimental symbol BT_LL_SW_SPLIT is enabled."). We have already tested this setting here, unfortunately without success.

    Regarding monitor mode debugging, I haven't been able to find anything for NCS/Zephyr so far.
    According to the Nordic recommendation: "SEGGER Embedded Studio Nordic Edition is no longer tested and recommended for new projects." (Release Notes 2.0.0) we have used Visual Studio Code for all new projects and those still under development.

    The question now is whether that was a good decision and whether NCS is perhaps not yet suitable for productive use at all?

    Thanks for the support!

    Marko

  • I ported one of my own projects back from NCS v2.2.0 to v.1.9.1 today.
    Now I can easily debug after bt_le_adv_start!!!
    Pause and resume, display variables... no problem.

    The project has the same settings, only <zephyr/..." and <zephyr/kernel.h> have been ported back to the old syntax and %s have been removed from log output.

  • I tested my application code again with different SDK versions:

    2.0.2 Debug OK
    2.1.0 Debug OK
    2.1.2 Debug OK
    2.1.3 Debug OK
    2.2.0 Faulting instruction address (r15/pc): 0x00000fb6

    According to nRF Debug Memory Viewer, the symbol name of address 0x00000fb6 is "lll_preempt_calc".

    In a test with the Bluetooth Peripheral example, however, it still crashes with 2.1.3:

    Faulting instruction address (r15/pc): 0x0001fb78

    According to nRF Debug Memory Viewer, the symbol name of address 0x0001fb78 is "m_assert_handler".

    As another test, I have added the CONFIG_BT_LL_SW_SPLIT option, which I had to use in my project, to the example. The example can now also be debugged (however, only up to SDK 2.1.3).

  • Hi Marko, thanks for sharing details on this! So this sounds like there is a bug on our side. I'll share your findings internally and get back to you when I know more.

    Best regards,

    Raoul

Related