This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Debugging HardFault when calling sd_ble_gap_adv_start (works on one board, crashes on the other)

Hi everyone,

Preface: I'm an educator and my students are familiar with the Arduino IDE, so I'm using that in combination with the sandeepmistry/arduino-nRF5 core.

Our main platform is the BBC micro:bit with the nRF51822 QFAA, and things works just as intended. I have a demo sketch that can reliably do both broadcast + observer roles with the S130 2.0.1 softdevice. Everything peachy there.

However, I also have another custom development board with the nRF51822 QFAC, and when I flash the same sketch with the same softdevice on that one, I get a HardFault shortly after calling sd_ble_gap_adv_start(). It's not happening immediately, maybe ~ 100 ms after the call, so I assume it's somehow timer-related, as also described in https://devzone.nordicsemi.com/f/nordic-q-a/11809/hardfault-error-after-calling-sd_ble_gap_adv_start.

My question now is what possible reasons are that this works on one board and doesn't on the other? I can think of the following differences right now:

microbit custom board
QFAA (16k RAM) QFAC (32k RAM)
LF RC Oscillator LF 32kHz Crystal
CMSIS-DAP STLink v2

The one thing that is different and is also timing-related is the LF oscillator. If that is incorrectly initialized, could it be the reason for the softdevice faulting?

Thanks & best regards, Florian

Parents
  • Hi,

    1. Have you tried using the LF RC Oscillator on your custom board to rule out a crystal issue?
    2. Are you sure it is a hard fault and not an assert?
    3. Can you provide a stack trace showing the origin of the hard fault?
    4. Can you tell us a bit more about the circumstances of the problem? Do you have multiple links or advertisers? Is there a central that tries to connect etc.?
    5. Do you have more than one custom board you can try?
    6. Or even an nRF51 DK that is also fitted with a QFAC rev 3 chip?
  • Hi Martin, thanks for the quick reply. Follow-ups below:

    1. Same thing happens with RC Oscillator or even synthesized clock source.

    2. How can I tell the difference?

    3. Not sure how reliable this is, I'm getting two different stack traces (on SD130, 2.0.1):

    (gdb) bt
    #0  0xfffffffe in ?? ()
    #1  <signal handler called>
    #2  0x0001190e in ?? ()
    #3  0x000118ce in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb) 
    

    (gdb) bt
    #0  0x0001d1e2 in ?? ()
    #1  <signal handler called>
    #2  0x20007e82 in ?? ()
    #3  0x000012f2 in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb)

    The first one may have been a fluke, though.

    4. The code I'm using is pretty basic, even if I completely disable the observer role functions and just start a single advertiser, the same thing still happens.

    5. Working on that :-)

    6. Good idea, I might still have a PCA10001 in the lab, I'll check the revision.

    Best, Florian

  • Thank for the answers.

    1. Strange. That points away from the timing issue theory, doesn't it?
    2. By asserts I was thinking about functions returning an error that is handled by an assert handler. Typically with APP_ERROR_CHECK(err_code) in the SDKs. That sends your application into a system reset or an endless while loop, but does not technically "crash" your system. People often mix the terms so I just wanted it confirmed. 
    3. Could you do a couple of more tests to check whether one of them are more reliable than the other?
    4. Have you tried basic unmodified BLE examples from the SDK?
  • Strange. That points away from the timing issue theory, doesn't it?

    Indeed, this is getting more and more confusing...

    Regarding the asserts, OpenOCD always shows that I'm in the HardFault handler when I issue a halt request. I'm not 100% sure if this is a secondary effect from an assert, though?

    Have you tried basic unmodified BLE examples from the SDK?

    OK, so I've taken the ble_app_beacon from SDK 10.0.0 and modified it to use the synthesized clock source, and softdevice S110 8.0.0 (I know that combination was working originally). When I flash the hex files for S110 8.0.0 and the example to the micro:bit, I get a beacon.

    When I flash the exact same hex files to the custom board, I get a HardFault with the following backtrace:

    (gdb) bt
    #0  0x0001af36 in HardFault_Handler ()
    #1  <signal handler called>
    #2  0x00016538 in ?? ()
    #3  <signal handler called>
    #4  0x0001ad72 in sd_nvic_EnableIRQ ()
    #5  0x0001aea0 in softdevice_handler_init ()
    #6  0x0001a02e in main ()
    (gdb)

    Exact sequence to reproduce is as follows:

    reset halt
    nrf51 mass_erase
    program /home/floe/work/nordic/nrf51-sdk-10.0.0/components/softdevice/s110/hex/s110_nrf51_8.0.0_softdevice.hex verify reset
    program /home/floe/work/nordic/nrf51-sdk-10.0.0/examples/ble_peripheral/ble_app_beacon/pca10028/s110/armgcc/_build/nrf51422_xxaa_s110.hex
    reset run
    

  • OK, I have to apologize: I've soldered up the debugger connection for the second custom board, and presto, everything works as expected (Advertiser, peripherals, etc.).

    It seems I've (ab-)used the first board a little too much and fried something (short-circuit, ESD, ...?).

    Many apologies again for wasting everyone's time :-)

  • No problem. Happens to all of us from time to time. Glad you figured it out. 

Reply Children
No Data
Related