bt_sdc_hci_driver: SoftDevice Controller ASSERT: 33, 531?

Hello all.

I've been writing code for an nRF52832, now using the v2.3.0 SDK but previously on v2.2.0; I upgraded just in case.

I had a set of disparate but working pieces of code that I have put together into one application/device and I'm getting errors - or maybe just one, but it's sufficient - and I'm unable to diagnose. I have had to rearrange code to make better use of memory, so pretty much none of it is exactly as it was before, but really it's failing to debug properly that has me concerned.

Scenario 1:

If I hook up RTT logging and reset the device then it spits out ~30 lines of log and then nothing happens:

[00:00:00.002,838] <inf> BLE_Module: Client starting...

[00:00:00.009,460] <inf> fs_nvs: 6 Sectors of 4096 bytes
[00:00:00.009,460] <inf> fs_nvs: alloc wra: 0, fd0
[00:00:00.009,490] <inf> fs_nvs: data wra: 0, 20
[00:00:00.010,192] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision:
d8 0c 2d 2f 36 ae e2 5c 80 26 80 4c 3f 4d 16 53 |..-/6..\ .&.L?M.S
50 96 c7 73 |P..s
[00:00:00.010,284] <inf> Dispatcher_Module: Dispatcher waiting for semaphore.

[00:00:00.010,345] <inf> UART_SINK: uart_sink_module waiting for semaphore.

[00:00:00.013,305] <inf> bt_hci_core: No ID address. App must call settings_load()
[00:00:00.013,336] <inf> BLE_Module: >bt ready

[00:00:00.013,641] <inf> BLE_Module: settings loaded

[00:00:00.013,671] <inf> BLE_Module: <bt ready

[00:00:00.014,434] <inf> BLE_Module: Scanning successfully started

[00:00:00.014,495] <inf> Program: Bluetooth initialised

[00:00:00.014,526] <inf> TIME: DS3231 on ebyte_e73_tbb_nrf52832 syncclock 1000 Hz


[00:00:00.014,739] <inf> TIME: DS3231 has not experienced an oscillator fault

[00:00:00.014,953] <inf> TIME:
DS3231 ctrl 04 ; ctrl_stat 08

[00:00:00.014,984] <wrn> date_time: Valid time not currently available
[00:00:00.014,984] <err> TIME: Unable to get time: -61

[00:00:00.015,014] <inf> Program: Time initialised

[00:00:01.010,375] <inf> BLE_Module: 0 / sec

[00:00:01.119,567] <inf> BLE_Module: Device found: 30:A9:DE:0B:7D:39 (public) (RSSI -86)

[00:00:02.002,777] <inf> cpu_load: Load:3,323%
[00:00:02.010,498] <inf> BLE_Module: 0 / sec

[00:00:02.018,066] <inf> sd: Legacy card detected, no CMD8 support
[00:00:02.018,829] <inf> sd: Legacy card detected, no CMD8 support
[00:00:02.019,561] <inf> sd: Legacy card detected, no CMD8 support
[00:00:02.020,324] <inf> sd: Legacy card detected, no CMD8 support
(nothing else arrives)

I unfortunately cannot tell whether the device is, or isn't, doing anything at that point. Both the cpu_load and the BLE_Module are supposed to be logging at 1Hz, but nothing arrives Logging is always stopping during the SD card detection (which "fails" as expected because there's no card in the slot).

Scenario 2:

If I run on the debugger... it seems to work better. There are no errors trapped, and I can step past (many) logging lines that never actually appear in the RTT log from the device. But, if I pause-go-pause-go, or breakpoint "too long", sooner or later I get the issue in the title:

(note that I get further logging here than I do when NOT debugging).

... logging more or less as above, then:
[00:00:[00:00:02.021,820] <inf> sd: Legacy card detected, no CMD8 support
[00:00:02.022,552] <inf> sd: Legacy card detected, no CMD8 support
[00:00:02.023,315] <inf> sd: Legacy card detected, no CMD8 support
[00:00:02.024,047] <inf> sd: Legacy card detected, no CMD8 support
[00:00:02.024,810] <inf> sd: Legacy card detected, no CMD8 support
[00:00:02.025,543] <inf> sd: Legacy card detected, no CMD8 support
[00:00:02.025,573] <inf> sd: Card does not support CMD8, assuming legacy card
[00:00:02.027,130] <err> sd: No usable card type was found
[00:00:02.027,130] <err> sd: Unusable card
[00:00:02.027,160] <err> Log_Module: Storage init ERROR!

[00:00:02.027,160] <wrn> Log_Module: Could not open SD card (-1)

[00:00:02.027,191] <inf> Program: Log NOT initialised

[00:00:02.027,191] <inf> UART_SINK: debug uart is 0x35470

[00:00:02.027,221] <inf> UART_SINK: debug uart is ready and configuring callbacks

[00:00:02.027,221] <inf> Dispatcher_Module: Uart sink initialised

[00:00:02.027,252] <inf> ECUDL_Display: Dispatcher initialised

[00:00:02.027,282] <inf> UART_SINK: uart_sink_module commencing business.

[00:00:02.027,313] <inf> Dispatcher_Module: Dispatcher commencing businesss.

[00:00:02.969,848] <inf> BLE_Module: Device found: 30:A9:DE:0B:7D:39 (public) (RSSI -80)

[00:00:03.010,559] <inf> BLE_Module: 0 / sec
[00:00:06.832,733] <err> bt_sdc_hci_driver: SoftDevice Controller ASSERT: 33, 531
[00:00:06.832,763] <err> os: ***** HARD FAULT *****
[00:00:06.832,763] <err> os: Fault escalation (see below)
[00:00:06.832,794] <err> os: ARCH_EXCEPT with reason 3

[00:00:06.832,794] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000015
[00:00:06.832,824] <err> os: r3/a4: 0x20001db8 r12/ip: 0x00008664 r14/lr: 0x000158e5
[00:00:06.832,824] <err> os: xpsr: 0x6100001b
[00:00:06.832,824] <err> os: Faulting instruction address (r15/pc): 0x000252e0
[00:00:06.832,885] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:06.832,885] <err> os: Fault during interrupt handling

[00:00:06.832,916] <err> os: Current thread: 0x20003708 (main)
[00:00:08.643,615] <err> fatal_error: Resetting system
Now I'm a bit lost. Is the above the actual problem that's just never appearing in the RTT/log from the non-debugged version, or is this a problem that's caused by my debugging (i.e. I stalled everything past an IRQ timeout or similar)? If the latter, how do I find the actual problem? If the former, why isn't RTT able to show the issue (without a debugger attached)?
My log settings are currently:
CONFIG_LOG=y
CONFIG_LOG_BACKEND_RTT=y
CONFIG_LOG_BLOCK_IN_THREAD=y
If it helps on v2.2.0 I also received the ASSERT on file 53, line 826.
If anybody can give me pointers for what to try next, or if other information would help here, please let me know...
Many thanks,
Tim.
  • Right. It was my fault.

    I hooked up an oscilloscope to a spare GPIO pin and toggled it in my 1Hz reporting thread: that was toggling in both my debug and in the non-debug, so nothing wrong with the initialisation.

    So I looked at the code again and realised that one of the threads I was spinning up wasn't sleeping; in fact it consuming 100% CPU and starving the logging. I got further in the debug because I was probably pausing that thread and allowing other stuff - like logging - to run. I now believe that the ASSERT is what I thought: starvation of an IRQ or similar.

    I got logging working in the no-sleep case by adding this config property to force immediate logging instead of waiting for the logging thread to have its turn:

    CONFIG_LOG_MODE_IMMEDIATE=y

    This then showed me that the CPU was pegged at 100% (thank you cpu_load_init!).

    The fix was simple: put a sleep into the thread that was starving the rest, including logging. I have now removed the CONFIG_LOG_MODE_IMMEDIATE and it still all behaves, sitting normally at ~7% CPU instead of 100%.

    My remaining question is: is the ASSERT "expected" while debugging? If so, is there a way to avoid it? Having random kernel oops while debugging anything might get a bit... annoying. Slight smile

  • Hi Tim, I'm glad you found the source of your issue.

    Regarding the asserts, you'll see that many customers on DevZone experience this when trying to set breakpoints in their BLE-enabled applications. (You could search for "assert 2195" in particular). The fact is that with normal "halting debugging", it's simply not possible to set breakpoints on any application that uses the SoftDevice controller or Zephyr BLE controller. BLE is extremely sensitive to timing and so the link layer code has asserts all over the place to ensure that timing requirements are upheld. If not, they will fault.

    One solution is "monitor mode debugging" which is an alternative to halting debugging, where it keeps high priority interrupts such as those from the SoftDevice, running in the background. But we don't currently have a solution for this as far as I know. I could look more into this when I have the time, if you like.

    Best regards,

    Raoul

Related