This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Running into a SIGTRAP, backtrace shows only main(), apparently happening in libgloss/arm/crt0.S

I'm using SDK v16 on an nRF52832 and occasionally run into a SIGTRAP with my custom application mainly forwarding BLE traffic from/to UART.

It works fine, only after quite a few interactions (can be 3 can be 30) - initiated by a BLE UART client running on Android - it stops being responsive.

Attached debugger (Black Magic Probe) provides me with the following output:

Starting program: /data/src/nrf5x-sdk-vanilla/projects/[..]/s132/armgcc/_build/nrf52832_xxaa.out

Program received signal SIGTRAP, Trace/breakpoint trap.
warning: while parsing target memory map (at line 1): Required element <memory> is missing
0x0002be5c in main ()
(gdb) l
1    ../../../../../../../../../libgloss/arm/crt0.S: No such file or directory.
(gdb) bt
#0  0x0002be5c in main ()
(gdb)

I'd be happy for any hint or idea. I could think of this being a arbitrary memory corruption. However I'm wondering about the SIGTRAP (not SEGV), libgloss/arm/crt0.S (no user code), as well as consistently ending up in this very state.

Parents
  • Compiling with -DDEBUG, -g3 and -O0 reveals some more:

    Program received signal SIGTRAP, Trace/breakpoint trap.
    warning: while parsing target memory map (at line 1): Required element <memory> is missing
    0x0002ce36 in app_error_fault_handler (id=16385, pc=225711, info=536936400)
        at ../../../../../../components/libraries/util/app_error_weak.c:100
    100	    NRF_BREAKPOINT_COND;
    (gdb) bt
    #0  0x0002ce36 in app_error_fault_handler (id=16385, pc=225711, info=536936400)
        at ../../../../../../components/libraries/util/app_error_weak.c:100
    #1  0x0002ccc4 in app_error_handler (error_code=16385, line_num=225711, 
        p_file_name=0x4001 "\211\240\201hh\200\211\340\201\233\346\020&O\360#\b")
        at ../../../../../../components/libraries/util/app_error_handler_gcc.c:49
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb)

  • So error_code=16385 is 0x4001 which according to components/libraries/util/app_error.h is

    (NRF_FAULT_ID_SDK_RANGE_START + 1) /**< An error stemming from a call to @ref APP_ERROR_CHECK or @ref APP_ERROR_CHECK_BOOL. The info parameter is a pointer to an @ref error_info_t variable. */

    which is already bringing me closer - telling me it's a result from an APP_ERROR_CHECK() call (not explaining the corrupted stack yet, though). Now trying to figure out which APP_ERROR_CHECK() call.

    Unfortunately the info appears to be screwed. According to above comment for the define, info=536936400 is supposed to be a pointer to an instance of struct error_info_t, containing the information I'm looking for. Trying to access it via GDB however results in;

    (gdb) p *((error_info_t*)(info))
    Cannot access memory at address 0x2000ffd0

    Besides I do wonder about p_file_name=0x4001. How did the error_code make it as arg towards p_file_name which appears to actually contain a pointer?

  • Thank you for your reply!

    it translates to line 12 in the following snippet:

            if(m_conn_handle != BLE_CONN_HANDLE_INVALID)
            {
              do
              {
                ret = ble_nus_data_send(&m_nus, m_uart_buf[uart_buf_id], (uint16_t *)(&(m_uart_buf_pos[uart_buf_id])), m_conn_handle);
                if ((ret != NRF_ERROR_INVALID_STATE) &&
                    (ret != NRF_ERROR_BUSY) &&
                    (ret != NRF_ERROR_NOT_FOUND))
                {
                  APP_ERROR_CHECK(ret);
                }
              } while (ret == NRF_ERROR_BUSY); // <<<---------------
            }

  • Could you place a breakpoint at APP_ERROR_CHECK(ret); to check that you do not enter that? And if you enter, what is the value of ret?

  • This comment is obsolete (while the actual issue still exists and is further discussed in the reply/replies after this one), but leaving it here for reference if somebody else is having issues with the Black Magic Probe and breakpoints not working:

    =========

    For the following code snippet:

              do
              {
                ret = ble_nus_data_send(&m_nus, m_uart_buf[uart_buf_id], (uint16_t *)(&(m_uart_buf_pos[uart_buf_id])), m_conn_handle);
                if ((ret != NRF_ERROR_INVALID_STATE) &&
                    (ret != NRF_ERROR_BUSY) &&
                    (ret != NRF_ERROR_NOT_FOUND))
                {
                  noop(ret);
                  APP_ERROR_CHECK(ret);
                }
              } while (ret == NRF_ERROR_BUSY);

    I set a breakpoint in noop() via

    (gdb) break noop
    Breakpoint 1 at 0x370d6: file ../../../main.c, line 1568.
    (gdb) r

    but still end up in

    0x0002ce56 in app_error_fault_handler (id=16385, pc=225911, info=536936392) at ../../../../../../components/libraries/util/app_error_weak.c:100
    100	    NRF_BREAKPOINT_COND;
    (gdb) bt
    #0  0x0002ce56 in app_error_fault_handler (id=16385, pc=225911, info=536936392)

    addr2line still points to the "} while (ret == NRF_ERROR_BUSY);" line.

    What I can say, though, is, that removing the APP_ERROR_CHECK() call seems to make this issue go away. However this obviously isn't the solution. Also I wonder about why the breakpoint doesn't trigger. Compiled with -O0 and -DDEBUG. Also, GDB seems to see the noop symbol, so I'd say it's there.

    I went for the noop() instead of setting a breakpoint on APP_ERROR_CHECK() as GDB doesn't appear to see the macro:

    (gdb) break APP_ERROR_CHECK
    Function "APP_ERROR_CHECK" not defined.
    Make breakpoint pending on future shared library load? (y or [n]) y
    Breakpoint 1 (APP_ERROR_CHECK) pending.

    which makes sense, as it's not a symbol ending up in the binary.

    Setting a breakpoint on app_error_fault_handler also doesn't trigger a break, I still end up in NRF_BREAKPOINT_COND with a corrupt stack.

    For completeness here the definition of noop():

    void noop(uint8_t err_code) {
      UNUSED_VARIABLE(err_code);
    }

    UPDATE: Neither does "break main" actually break on main() - so I assume that's a different issue.

    =========================

    UPDATE 2: Ok, this appears to be a problem with the debugger I use - the Black Magic Probe. The issue was fixed in a later version, for reference and the record: https://github.com/blacksphere/blackmagic/issues/230
    Having breakpoints working now! Actual issue further investigated as part of the next comment.

  • Ok, so after having debugged and fixed the firmware of my debugger, back to the actual issue:

    When setting a breakpoint on app_error_fault_handler I get the following:

    Starting program: /data/src/nrf5x-sdk-vanilla/[..]/s132/armgcc/_build/nrf52832_xxaa.out 
    Note: automatically using hardware breakpoints for read-only addresses.
    
    Breakpoint 1, app_error_fault_handler (id=16385, pc=225911, info=536936392) at ../../../../../../components/libraries/util/app_error_weak.c:58
    58	    __disable_irq();

    while the backtrace still looks messy:

    (gdb) bt
    #0  app_error_fault_handler (id=16385, pc=225911, info=536936392) at ../../../../../../components/libraries/util/app_error_weak.c:58
    #1  0x0002cce4 in app_error_handler (error_code=16385, line_num=225911, p_file_name=0x2000ffc8 "\226\006")
        at ../../../../../../components/libraries/util/app_error_handler_gcc.c:49
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb)

    So I again went for the noop(), which results into the following code:

                ret = ble_nus_data_send(&m_nus, m_uart_buf[uart_buf_id], (uint16_t *)(&(m_uart_buf_pos[uart_buf_id])), m_conn_handle);
                if ((ret != NRF_ERROR_INVALID_STATE) &&
                    (ret != NRF_ERROR_BUSY) &&
                    (ret != NRF_ERROR_NOT_FOUND))
                {
                  if(ret != 0)
                    noop(ret);
                  APP_ERROR_CHECK(ret);

    resulting in:

    Starting program: /data/src/nrf5x-sdk-vanilla/sensorberg/projects/smartspaces/sdg03/s132/armgcc/_build/nrf52832_xxaa.out 
    Note: automatically using hardware breakpoints for read-only addresses.
    
    Breakpoint 1, noop (err_code=1 '\001') at ../../../main.c:1568
    1568	}

    having a seemingly not (yet) messed up stack:

    (gdb) bt
    #0  noop (err_code=1 '\001') at ../../../main.c:1568
    #1  0x00037266 in main () at ../../../main.c:1686

    So at that point we know that ble_nus_data_send() sometimes returns 0x01.

    Continuing execution now leads me right into the NRF_BREAKPOINT_COND with the corrupt stack:

    (gdb) c
    Continuing.
    
    Program received signal SIGTRAP, Trace/breakpoint trap.
    0x0002ce56 in app_error_fault_handler (id=16385, pc=225917, info=536936392) at ../../../../../../components/libraries/util/app_error_weak.c:100
    100	    NRF_BREAKPOINT_COND;
    (gdb) bt
    #0  0x0002ce56 in app_error_fault_handler (id=16385, pc=225917, info=536936392)
        at ../../../../../../components/libraries/util/app_error_weak.c:100
    #1  0x0002cce4 in app_error_handler (error_code=16385, line_num=225917, 
        p_file_name=0x4001 "\211\240\201hh\200\211\340\201\233\346\020&O\360#\b")
        at ../../../../../../components/libraries/util/app_error_handler_gcc.c:49
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    (gdb)

    So there's 2 questions now:

    a) why does ble_nus_data_send() sometimes returns 0x01?
    b) why does APP_ERROR_CHECK() /appear/* to end up with a corrupted stack?

    *obviously the actual root cause can be somewhere else and APP_ERROR_CHECK() only accesses memory already corrupted somewhere/-when before

  • a) Are you sure ret is actually 0x01 in this case? Here is the error codes that can be returned by sd_ble_gatts_hvx

    b) This will happen as the timing in the SD will be messed up when you halt at a breakpoint. i.e. the timers will continue to run and the event manager will be lost.

Reply Children
Related