Debuggen of BLE Applications based on Connect SDK 2.2.0 not possible -> Faulting instruction address (r15/pc)

I have the problem that I am currently unable to debug any of my Bluetooth applications.
I'm sure that I was able to do this without any problems during the original development with SDK v1.9.1.

The firmware works fine without a debugger.
However, when used with debugger, the firmware crashes shortly after bt_le_adv_start with the following output:

ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c:473
lll_preempt_calc: Actual EVENT_OVERHEAD_START_US = 2021270
[00:00:04.303,894] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000002 r2/a3: 0x00000001
[00:00:04.303,894] <err> os: r3/a4: 0x00000000 r12/ip: 0x0000d796 r14/lr: 0x00000fb9
[00:00:04.303,924] <err> os: xpsr: 0x41000028
[00:00:04.303,924] <err> os: Faulting instruction address (r15/pc): 0x00000fc4
[00:00:04.303,955] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:04.303,985] <err> os: Fault during interrupt handling[00:00:04.304,016] <err> os: Current thread: 0x20002150 (main)
[00:00:04.556,945] <err> fatal_error: Resetting system

After various investigations in my own projects, I tested older (v2.0.0) and newer (v2.2.99-dev3) SDK's. Unfortunately without success!

VSCode was also uninstalled (+ .vscode deleted in the user folder) and completely reinstalled. Also without success!

I then tried to debug the sample samples/bluetooth/peripheral (built with Connect SDK v2.2.0).
But here the firmware also crashes with the following output:

[00:00:03.346,[00:00:03.346,435] <inf> fs_nvs: nvs_mount: alloc wra: 0, fb8
[00:00:03.346,435] <inf> fs_nvs: nvs_mount: data wra: 0, 4c
[00:00:03.346,588] <inf> sdc_hci_driver: hci_driver_open: SoftDevice Controller build revision:
6d 90 41 2a 38 e8 ad 17 29 a5 03 38 39 27 d7 85 |m.A*8... )..89'..
1f 85 d8 e1 |....
[00:00:03.349,853] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:03.349,884] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:03.349,914] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 109.16784 Build 2917677098
[00:00:03.350,402] <inf> bt_hci_core: bt_init: No ID address. App must call settings_load()
[00:00:03.353,149] <inf> bt_hci_core: bt_dev_show_info: Identity: E7:BF:32:EE:30:C4 (random)
[00:00:03.353,179] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x11fa, manufacturer 0x0059
[00:00:03.353,210] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0x11fa
[00:00:12.012,115] <err> mpsl_init: m_assert_handler: MPSL ASSERT: 112, 2195
[00:00:12.012,115] <err> os: hard_fault: HARD FAULT
[00:00:12.012,145] <err> os: hard_fault: Fault escalation (see below)
[00:00:12.012,145] <err> os: hard_fault: ARCH_EXCEPT with reason 3
[00:00:12.012,176] <err> os: esf_dump: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000018
[00:00:12.012,176] <err> os: esf_dump: r3/a4: 0x0002d80f r12/ip: 0x00000000 r14/lr: 0x0002aad9
[00:00:12.012,207] <err> os: esf_dump: xpsr: 0x61000018
[00:00:12.012,207] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00029088[0m
[00:00:12.012,237] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:12.012,268] <err> os: z_fatal_error: Fault during interrupt handling

The crash occurs in bt_ready at line:

printk("Advertising successfully started\n");

If the line with bt_le_adv_start is commented out, you can debug without any problems!

Information about the environment:
  • Windows 10 22H2
  • nRF Connect SDK: v2.2.0 (v2.0.0, v2.2.99-dev3)
  • Visual Studio Code: Version: 1.75.0 (system setup)
  • nRF Connect for VS Code: v2023.1.44
  • Board: nRF52 DK NRF52832
Parents
  • Hi Sven,

    Are you setting a breakpoint somewhere? Unfortunately it is simply not possible to do halting debugging of BLE applications. The Bluetooth controller has strict timing requirements and will hardfault when these can't be met. See this post for a more detailed explanation:  RE: Debugging while BT is working 

    If you need to debug your application, it's best to disable BLE temporarily. If you really have to debug with BLE, there is another debugging mode called Monitor Mode Debugging: https://www.segger.com/products/debug-probes/j-link/technology/monitor-mode-debugging/

    It kind of keeps the CPU running in a loop at the breakpoint, allowing other timing critical things to continue running in the background.

    I'm not familiar with Monitor Mode Debugging, and from what I can see, support for monitor mode debugging is still limited in NCS:  Problem using monitor mode debugging with the nrf connect sdk Debugging with Monitor Mode on NCS

    So I hope that you are able to debug your application with BLE temporarily disabled.

    Best regards,

    Raoul

  • Hi, I'm working with Sven and can add one more thing to the problem.

    The crash occurs as soon as advertisements are launched when there is no connection at all.

    I have earlier observed that a breakpoint in the processing of Bluetooth data (when connected) results in a disconnection at remote participant. However, I think that earlier (must have been NCS v1.9.x) I was able to debug at least one received packet to the end (without crashing).

    Turning off Bluetooth certainly helps when debugging application logic that has nothing to do with Bluetooth. However, our current development is receiving and sending data via Bluetooth.

    I've earlier had a crash (Instruction Address Error (r15/pc)) related to binding a UART after the advertising started. At that time I had found a hint on the web to use CONFIG_BT_LL_SW_SPLIT=y which fixed the crash (with "warning: Experimental symbol BT_LL_SW_SPLIT is enabled."). We have already tested this setting here, unfortunately without success.

    Regarding monitor mode debugging, I haven't been able to find anything for NCS/Zephyr so far.
    According to the Nordic recommendation: "SEGGER Embedded Studio Nordic Edition is no longer tested and recommended for new projects." (Release Notes 2.0.0) we have used Visual Studio Code for all new projects and those still under development.

    The question now is whether that was a good decision and whether NCS is perhaps not yet suitable for productive use at all?

    Thanks for the support!

    Marko

  • I ported one of my own projects back from NCS v2.2.0 to v.1.9.1 today.
    Now I can easily debug after bt_le_adv_start!!!
    Pause and resume, display variables... no problem.

    The project has the same settings, only <zephyr/..." and <zephyr/kernel.h> have been ported back to the old syntax and %s have been removed from log output.

  • I tested my application code again with different SDK versions:

    2.0.2 Debug OK
    2.1.0 Debug OK
    2.1.2 Debug OK
    2.1.3 Debug OK
    2.2.0 Faulting instruction address (r15/pc): 0x00000fb6

    According to nRF Debug Memory Viewer, the symbol name of address 0x00000fb6 is "lll_preempt_calc".

    In a test with the Bluetooth Peripheral example, however, it still crashes with 2.1.3:

    Faulting instruction address (r15/pc): 0x0001fb78

    According to nRF Debug Memory Viewer, the symbol name of address 0x0001fb78 is "m_assert_handler".

    As another test, I have added the CONFIG_BT_LL_SW_SPLIT option, which I had to use in my project, to the example. The example can now also be debugged (however, only up to SDK 2.1.3).

  • Hi Marko, thanks for sharing details on this! So this sounds like there is a bug on our side. I'll share your findings internally and get back to you when I know more.

    Best regards,

    Raoul

Reply Children
  • Hi Marko,

    Could I ask you to try switching to the Zephyr BLE controller? This can be enabled through CONFIG_BT_LL_SW_SPLIT.

    I'm curious if it works. Meanwhile, the original issue might be the result of a bug related to stack space, I'm not certain yet.

    If you are able to share your project with us, that would be helpful.

    Best regards,

    Raoul

  • Hi Raoul,
    We are already using CONFIG_BT_LL_SW_SPLIT in the current projects because we had problems with BLE + UART that could be solved with it.

    For this troubleshooting I would only refer to the example project samples/bluetooth/peripheral (with nRF52 DK NRF52832).

    I added a breakpoint to the line

    error = bt_enable(NULL);

    and

    base_notify();

    I only use the first breakpoint (bt_enable) to determine a restart (only stops at main() the first time)
    I want to debug the second breakpoint (bas_notify).

    Result of the different (Pristine) builds:
    NCS 2.1.3 without CONFIG_BT_LL_SW_SPLIT: crash while debugging
    NCS 2.1.3 with CONFIG_BT_LL_SW_SPLIT=y: Debugging works fine

    NCS 2.2.0 without CONFIG_BT_LL_SW_SPLIT: crash while debugging
    NCS 2.2.0 with CONFIG_BT_LL_SW_SPLIT: crash while debugging

    I hope I can help with this information to fix the error.

    Best regards
    Mark

  • Hi Mark, thanks a lot for sharing these details.

    I now realise that we have received multiple cases recently related to MPSL ASSERT: 112, 2195 being triggered by a breakpoint. I've informed the developers and shared some of your findings.

    I'll let you know when they find out what to do about it.

    Best regards,

    Raoul

  • Hi again,

    I realised that the MPSL assert was only related to the case where you enabled the Nordic SoftDevice controller, but that wasn't your original issue. Sorry for the confusion.

    Your original issue was related to the Zephyr BLE controller - you were able to set a breakpoint at the start of your BLE enabled app before, but now no longer. Regarding that, I've heard back from a developer:

    Zephyr Controller has recently added strict assertion on delayed events. If he strictly for debugging, he can comment out the said assertion:

    https://github.com/zephyrproject-rtos/zephyr/commit/ebf723626704aaffa95a2a22fb4415a4ae59bf00

    The Nordic SD Controller has never allowed breakpoints while it is enabled. Now the Zephyr controller doesn't either. So please note that this newly added assert should be seen more as a bugfix than anything else; timing is critical in BLE.

    Best regards,

    Raoul

  • Hi Raoul,
    does this mean that with newer SDK's it will never be possible to debug an application using BLE?

    This would mean the end of our projects related to Zephyr and Nordic MCU's!
    It is impossible to develop complex BLE projects without a debugger or to localize reported errors.

    You can't always remove BLE from the projects to be able to debug. This would be a high risk when creating a release candidate, as you might not re-enable what is commented out correctly.
    And I see another problem when you need the BLE communication to start your own processes that you want to debug.
    This is exactly what I need right now and it works with NCS 2.1.3 as long as I don't set breakpoints in the BLE communication but in my own threads.
    With NCS 2.1.3, even breakpoints when receiving BLE packets work, but only once, since the connection then breaks down (this is acceptable). But that's enough to be able to debug your own processing at least once.

    It is also interesting that the breakpoint in the sample project samples/bluetooth/peripheral already has problems, only when advertising is already activated (without a connection).

    Here the developers should create a way to also be able to debug with BLE. Either via CONFIG_DEBUG=y or a new explicit configuration parameter.

    It would be nice if the developers could find a way.
    Thanks :-)

    Best regards,
    Marko

Related