This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Running into a SIGTRAP, backtrace shows only main(), apparently happening in libgloss/arm/crt0.S

I'm using SDK v16 on an nRF52832 and occasionally run into a SIGTRAP with my custom application mainly forwarding BLE traffic from/to UART.

It works fine, only after quite a few interactions (can be 3 can be 30) - initiated by a BLE UART client running on Android - it stops being responsive.

Attached debugger (Black Magic Probe) provides me with the following output:

Starting program: /data/src/nrf5x-sdk-vanilla/projects/[..]/s132/armgcc/_build/nrf52832_xxaa.out

Program received signal SIGTRAP, Trace/breakpoint trap.
warning: while parsing target memory map (at line 1): Required element <memory> is missing
0x0002be5c in main ()
(gdb) l
1    ../../../../../../../../../libgloss/arm/crt0.S: No such file or directory.
(gdb) bt
#0  0x0002be5c in main ()
(gdb)

I'd be happy for any hint or idea. I could think of this being a arbitrary memory corruption. However I'm wondering about the SIGTRAP (not SEGV), libgloss/arm/crt0.S (no user code), as well as consistently ending up in this very state.

Parents
  • Compiling with -DDEBUG, -g3 and -O0 reveals some more:

    Program received signal SIGTRAP, Trace/breakpoint trap.
    warning: while parsing target memory map (at line 1): Required element <memory> is missing
    0x0002ce36 in app_error_fault_handler (id=16385, pc=225711, info=536936400)
        at ../../../../../../components/libraries/util/app_error_weak.c:100
    100	    NRF_BREAKPOINT_COND;
    (gdb) bt
    #0  0x0002ce36 in app_error_fault_handler (id=16385, pc=225711, info=536936400)
        at ../../../../../../components/libraries/util/app_error_weak.c:100
    #1  0x0002ccc4 in app_error_handler (error_code=16385, line_num=225711, 
        p_file_name=0x4001 "\211\240\201hh\200\211\340\201\233\346\020&O\360#\b")
        at ../../../../../../components/libraries/util/app_error_handler_gcc.c:49
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb)

  • So error_code=16385 is 0x4001 which according to components/libraries/util/app_error.h is

    (NRF_FAULT_ID_SDK_RANGE_START + 1) /**< An error stemming from a call to @ref APP_ERROR_CHECK or @ref APP_ERROR_CHECK_BOOL. The info parameter is a pointer to an @ref error_info_t variable. */

    which is already bringing me closer - telling me it's a result from an APP_ERROR_CHECK() call (not explaining the corrupted stack yet, though). Now trying to figure out which APP_ERROR_CHECK() call.

    Unfortunately the info appears to be screwed. According to above comment for the define, info=536936400 is supposed to be a pointer to an instance of struct error_info_t, containing the information I'm looking for. Trying to access it via GDB however results in;

    (gdb) p *((error_info_t*)(info))
    Cannot access memory at address 0x2000ffd0

    Besides I do wonder about p_file_name=0x4001. How did the error_code make it as arg towards p_file_name which appears to actually contain a pointer?

  • Ok, so after having debugged and fixed the firmware of my debugger, back to the actual issue:

    When setting a breakpoint on app_error_fault_handler I get the following:

    Starting program: /data/src/nrf5x-sdk-vanilla/[..]/s132/armgcc/_build/nrf52832_xxaa.out 
    Note: automatically using hardware breakpoints for read-only addresses.
    
    Breakpoint 1, app_error_fault_handler (id=16385, pc=225911, info=536936392) at ../../../../../../components/libraries/util/app_error_weak.c:58
    58	    __disable_irq();

    while the backtrace still looks messy:

    (gdb) bt
    #0  app_error_fault_handler (id=16385, pc=225911, info=536936392) at ../../../../../../components/libraries/util/app_error_weak.c:58
    #1  0x0002cce4 in app_error_handler (error_code=16385, line_num=225911, p_file_name=0x2000ffc8 "\226\006")
        at ../../../../../../components/libraries/util/app_error_handler_gcc.c:49
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb)

    So I again went for the noop(), which results into the following code:

                ret = ble_nus_data_send(&m_nus, m_uart_buf[uart_buf_id], (uint16_t *)(&(m_uart_buf_pos[uart_buf_id])), m_conn_handle);
                if ((ret != NRF_ERROR_INVALID_STATE) &&
                    (ret != NRF_ERROR_BUSY) &&
                    (ret != NRF_ERROR_NOT_FOUND))
                {
                  if(ret != 0)
                    noop(ret);
                  APP_ERROR_CHECK(ret);

    resulting in:

    Starting program: /data/src/nrf5x-sdk-vanilla/sensorberg/projects/smartspaces/sdg03/s132/armgcc/_build/nrf52832_xxaa.out 
    Note: automatically using hardware breakpoints for read-only addresses.
    
    Breakpoint 1, noop (err_code=1 '\001') at ../../../main.c:1568
    1568	}

    having a seemingly not (yet) messed up stack:

    (gdb) bt
    #0  noop (err_code=1 '\001') at ../../../main.c:1568
    #1  0x00037266 in main () at ../../../main.c:1686

    So at that point we know that ble_nus_data_send() sometimes returns 0x01.

    Continuing execution now leads me right into the NRF_BREAKPOINT_COND with the corrupt stack:

    (gdb) c
    Continuing.
    
    Program received signal SIGTRAP, Trace/breakpoint trap.
    0x0002ce56 in app_error_fault_handler (id=16385, pc=225917, info=536936392) at ../../../../../../components/libraries/util/app_error_weak.c:100
    100	    NRF_BREAKPOINT_COND;
    (gdb) bt
    #0  0x0002ce56 in app_error_fault_handler (id=16385, pc=225917, info=536936392)
        at ../../../../../../components/libraries/util/app_error_weak.c:100
    #1  0x0002cce4 in app_error_handler (error_code=16385, line_num=225917, 
        p_file_name=0x4001 "\211\240\201hh\200\211\340\201\233\346\020&O\360#\b")
        at ../../../../../../components/libraries/util/app_error_handler_gcc.c:49
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb)

    So there's 2 questions now:

    a) why does ble_nus_data_send() sometimes returns 0x01?
    b) why does APP_ERROR_CHECK() /appear/* to end up with a corrupted stack?

    *obviously the actual root cause can be somewhere else and APP_ERROR_CHECK() only accesses memory already corrupted somewhere/-when before

  • a) Are you sure ret is actually 0x01 in this case? Here is the error codes that can be returned by sd_ble_gatts_hvx

    b) This will happen as the timing in the SD will be messed up when you halt at a breakpoint. i.e. the timers will continue to run and the event manager will be lost.

  • Re a) I don't know what else to tell from the GDB output, so yes, fairly sure

    Re b) it's the same corrupted stacktrace I get /without/ the breakpoint. See initial post (corrupted stacktrace without breakpoints set) and the one with breakpoints right after the break point in noop() is called only a line later. It's the same corrupted backtrace within APP_ERROR_CHECK(). Doesn't look like a co-incidence or GDB/breakpoint related (timing-)issue to me.

  • Are you able to recreate this issue on a nordic DK? and what hardware are you currently running your code on?

  • Hardware is an nRF52832.

    I now ordered a PCA10040 and will then try to reproduce. Can you elaborate on why you think this might be hardware specific?

Reply Children
Related