This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Segmentation fault in nrf_log_frontend.c when writing string onto USB CDC-ACM iface

Hello,

when writing content onto the CDC-ACM interface of the nrf52840, I sooner or later always end up in a Segmentation fault.

I'm on nRF52840 and using the SDK v16.

I'm writing the string "ABCDEFGHIJKLMNOPQRSTOVWXYZ" - per byte, using app_usbd_cdc_acm_write(&m_app_cdc_acm, &byte, 1); - looping through the string.

The Segmentation fault happens in total different scenarios with different code and structures.

What's always the same though is, that my debugger shows it happening in:

Program received signal SIGSEGV, Segmentation fault.
0x0002786e in nrf_log_backend_flush (warning: while parsing target memory map (at line 1): Required element <memory> is missing
p_backend=0x1)
at ../../../../../../components/libraries/log/nrf_log_backend_interface.h:235
235 p_backend->p_api->flush(p_backend);

while the backtrace is an infinite scrolling of:

#0 0x0002786e in nrf_log_backend_flush (p_backend=0x1) at ../../../../../../components/libraries/log/nrf_log_backend_interface.h:235
#1 nrf_log_frontend_dequeue () at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:895
#2 0x000277be in nrf_log_frontend_dequeue () at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:556
#3 std_n (args=0x0, nargs=0, p_str=0x3606c "Backends flushed", severity_mid=393218) at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:562
#4 nrf_log_frontend_std_0 (severity_mid=393218, p_str=p_str@entry=0x3606c "Backends flushed") at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:573
#5 0x0002788e in nrf_log_frontend_dequeue () at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:898
#6 0x000277be in nrf_log_frontend_dequeue () at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:556
#7 std_n (args=0x0, nargs=0, p_str=0x3606c "Backends flushed", severity_mid=393218) at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:562
#8 nrf_log_frontend_std_0 (severity_mid=393218, p_str=p_str@entry=0x3606c "Backends flushed") at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:573
#9 0x0002788e in nrf_log_frontend_dequeue () at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:898
#10 0x000277be in nrf_log_frontend_dequeue () at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:556
#11 std_n (args=0x0, nargs=0, p_str=0x3606c "Backends flushed", severity_mid=393218) at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:562
#12 nrf_log_frontend_std_0 (severity_mid=393218, p_str=p_str@entry=0x3606c "Backends flushed") at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:573s
#13 0x0002788e in nrf_log_frontend_dequeue () at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:898
#14 0x000277be in nrf_log_frontend_dequeue () at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:556
#15 std_n (args=0x0, nargs=0, p_str=0x3606c "Backends flushed", severity_mid=393218) at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:562
#16 nrf_log_frontend_std_0 (severity_mid=393218, p_str=p_str@entry=0x3606c "Backends flushed") at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:573
#17 0x0002788e in nrf_log_frontend_dequeue () at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:898
[..]

I'm happy to create and provide a minimal test case, however the always same, yet seemingly unrelated backtrace (nrf_log_frontend) as well as the correlation with using app_usbd_cdc_acm_write() makes me wonder whether I should start looking somewhere else for its cause.

I'd be happy for any hint!

Parents
  • It looks very much like it is the log module that is causing some issue. Do you use deferred logging? (search for it in your sdk_config.h file).

    Does it happen if you turn off the logging module?

    #define NRF_LOG_ENABLED 0

    Do you check the return value for app_usbd_cdc_acm_write()? Does it always return 0, or does it return something else?

    What tool do you use when you see the SIGSEGV? Are you using an IDE, or a standalone debugger tool?

    I suspect that this is a hardfault. It may be caused by bad pointer/array handling. Is there something specific you add when this starts to show?

    Does it also happen if you test the usbd_cdc_acm example?

    BR,

    Edvin

  • Hello, thanks for your reply!

    > Does it happen if you turn off the logging module?

    It indeed is related to logging (not just because of the back trace, but disabling logging / removing (most of the) NRF_LOG_* statements helps. I can't pinpoint it to a particular one, but likelihood of this to happen seems to linearly decrease with every NRF_LOG_* statement removed.

    Deferred logging is disabled (0), as I experienced other kinds of trouble with NRF_LOG_DEFERRED enabled. I don't recall exactly what it was, but enabling it now makes things worse. The SoC appears to freeze at first attempt of writing onto the CDC-ACM interface. While it doesn't immediately crash (at least it doesn't throw a segv in GDB), it apparently again is stuck in logging routines:

    (gdb) 
    #0  0x00027474 in serial_tx (p_context=<optimized out>, 
        p_buffer=<optimized out>, len=<optimized out>)
        at ../../../../../../components/libraries/log/src/nrf_log_backend_uart.c:85
    #1  0x0002c138 in nrf_fprintf_buffer_flush (p_ctx=p_ctx@entry=0x2003fde0)
        at ../../../../../../external/fprintf/nrf_fprintf.c:56
    #2  0x000280fa in postfix_process (p_params=0x2003fdc8, p_params=0x2003fdc8, 
        newline=false, p_ctx=0x2003fde0)
        at ../../../../../../components/libraries/log/src/nrf_log_str_formatter.c:150
    #3  nrf_log_std_entry_process (p_str=p_str@entry=0x36e1c "_uart_send_string()", 
        p_args=p_args@entry=0x2003fdf8, nargs=nargs@entry=0, 
        p_params=p_params@entry=0x2003fdc8, p_ctx=p_ctx@entry=0x2003fde0)
        at ../../../../../../components/libraries/log/src/nrf_log_str_formatter.c:192
    #4  0x00027414 in nrf_log_backend_serial_put (p_backend=<optimized out>, 
        p_msg=0x20003d1c <log_mempool_nrf_balloc_pool_mem>, 
        p_buffer=p_buffer@entry=0x20002d10 <m_string_buff> "<debug> app: _uart_send_string()\r\r\ntivated - skipping initialization.\r\r\n_OPEN\r\r\n", 
        length=length@entry=4096, tx_func=tx_func@entry=0x27451 <serial_tx>)
        at ../../../../../../components/libraries/log/src/nrf_log_backend_serial.c:87
    #5  0x00027442 in nrf_log_backend_uart_put (p_backend=<optimized out>, 
        p_msg=<optimized out>)
        at ../../../../../../components/libraries/log/src/nrf_log_backend_uart.c:95
    #6  0x00027962 in nrf_log_backend_put (
        p_msg=0x20003d1c <log_mempool_nrf_balloc_pool_mem>, 
        p_backend=0x38f74 <uart_log_backend>)
        at ../../../../../../components/libraries/log/nrf_log_backend_interface.h:225
    #7  nrf_log_frontend_dequeue ()
        at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:862
    #8  0x00027fe6 in nrf_log_frontend_dequeue ()
        at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:713
    #9  0x00030db0 in on_idle () at ../../../main.c:183
    #10 _uart_send_string (data=0x37bd0 "1099\n", len=5, mutex_uart=true)
        at ../../../main.c:726
    #11 on_keepalive_timer_evt () at ../../../main.c:570
    #12 0x000287e6 in timer_expire (p_timer=0x20009030 <m_keepalive_timer_id_data>)
        at ../../../../../../components/libraries/timer/app_timer2.c:184
    #13 on_compare_evt (p_instance=0x20002c0c <m_rtc_inst>)
        at ../../../../../../components/libraries/timer/app_timer2.c:302
    #14 rtc_irq (p_instance=0x20002c0c <m_rtc_inst>)
        at ../../../../../../components/libraries/timer/app_timer2.c:465
    #15 <signal handler called>
    #16 0x000296bc in app_usbd_event_queue_process ()
        at ../../../../../../components/libraries/usbd/app_usbd.c:1336
    #17 0x00031c0c in on_idle () at ../../../main.c:180
    #18 main () at ../../../main.c:1662

    > Do you check the return value for app_usbd_cdc_acm_write()? Does it always return 0, or does it return something else?

    I check but don't see anything suspicious. Occasional NRF_ERROR_BUSY's, but AFAIK they're normal to happen every now and then.

    static uint8_t uart_put(uint8_t byte) {
      return app_usbd_cdc_acm_write(&m_app_cdc_acm2, &byte, 1);
    }
    [..]
          ret_code = uart_put(data[i]);
          if ((ret_code != NRF_SUCCESS) && (ret_code != NRF_ERROR_BUSY))
          //if (ret_code != NRF_SUCCESS)
          {
            NRF_LOG_WARNING("Write to UART failed for reason %d", ret_code);
          }
        } while (ret_code == NRF_ERROR_BUSY);
    [..]

    > What tool do you use when you see the SIGSEGV? Are you using an IDE, or a standalone debugger tool?

    As debugger I'm using a multiarch GDB (software) and a black magic probe (hardware), attached via SWD.

    > I suspect that this is a hardfault. It may be caused by bad pointer/array handling. Is there something specific you add when this starts to show?

    Nothing I'm aware of.

    I was considering this being a corrupted stack, however - plainly telling from experience - it looks pretty sane to me (apart from the recursion or whatever is causing this seemingly endless call path). Also, I always end up in this, while I'd suspect otherwise if the issue is a corrupted pointer.

    > Does it also happen if you test the usbd_cdc_acm example?

    I'll try to make it a minimal test case then based on above example.

    Please note, that for everything above I'm using the UART backend for NRF_LOG.

    Relevant parts:

    static void log_init(void)
    {
      ret_code_t err_code = NRF_LOG_INIT(NULL);
      APP_ERROR_CHECK(err_code);
    
      NRF_LOG_DEFAULT_BACKENDS_INIT();
    }

    #define NRF_LOG_ENABLED 1
    #define NRF_LOG_BACKEND_UART_ENABLED 1
    #define NRF_LOG_BACKEND_UART_TX_PIN 24
    #define NRF_LOG_BACKEND_UART_BAUDRATE 251658240
    #define NRF_LOG_BACKEND_RTT_ENABLED 0
    #define NRF_LOG_DEFERRED 0
    #define NRF_LOG_USES_TIMESTAMP 0
    #define NRF_LOG_DEFAULT_LEVEL 4

    Is there maybe something obvious I'm doing here prone to trigger such behaviour?

Reply
  • Hello, thanks for your reply!

    > Does it happen if you turn off the logging module?

    It indeed is related to logging (not just because of the back trace, but disabling logging / removing (most of the) NRF_LOG_* statements helps. I can't pinpoint it to a particular one, but likelihood of this to happen seems to linearly decrease with every NRF_LOG_* statement removed.

    Deferred logging is disabled (0), as I experienced other kinds of trouble with NRF_LOG_DEFERRED enabled. I don't recall exactly what it was, but enabling it now makes things worse. The SoC appears to freeze at first attempt of writing onto the CDC-ACM interface. While it doesn't immediately crash (at least it doesn't throw a segv in GDB), it apparently again is stuck in logging routines:

    (gdb) 
    #0  0x00027474 in serial_tx (p_context=<optimized out>, 
        p_buffer=<optimized out>, len=<optimized out>)
        at ../../../../../../components/libraries/log/src/nrf_log_backend_uart.c:85
    #1  0x0002c138 in nrf_fprintf_buffer_flush (p_ctx=p_ctx@entry=0x2003fde0)
        at ../../../../../../external/fprintf/nrf_fprintf.c:56
    #2  0x000280fa in postfix_process (p_params=0x2003fdc8, p_params=0x2003fdc8, 
        newline=false, p_ctx=0x2003fde0)
        at ../../../../../../components/libraries/log/src/nrf_log_str_formatter.c:150
    #3  nrf_log_std_entry_process (p_str=p_str@entry=0x36e1c "_uart_send_string()", 
        p_args=p_args@entry=0x2003fdf8, nargs=nargs@entry=0, 
        p_params=p_params@entry=0x2003fdc8, p_ctx=p_ctx@entry=0x2003fde0)
        at ../../../../../../components/libraries/log/src/nrf_log_str_formatter.c:192
    #4  0x00027414 in nrf_log_backend_serial_put (p_backend=<optimized out>, 
        p_msg=0x20003d1c <log_mempool_nrf_balloc_pool_mem>, 
        p_buffer=p_buffer@entry=0x20002d10 <m_string_buff> "<debug> app: _uart_send_string()\r\r\ntivated - skipping initialization.\r\r\n_OPEN\r\r\n", 
        length=length@entry=4096, tx_func=tx_func@entry=0x27451 <serial_tx>)
        at ../../../../../../components/libraries/log/src/nrf_log_backend_serial.c:87
    #5  0x00027442 in nrf_log_backend_uart_put (p_backend=<optimized out>, 
        p_msg=<optimized out>)
        at ../../../../../../components/libraries/log/src/nrf_log_backend_uart.c:95
    #6  0x00027962 in nrf_log_backend_put (
        p_msg=0x20003d1c <log_mempool_nrf_balloc_pool_mem>, 
        p_backend=0x38f74 <uart_log_backend>)
        at ../../../../../../components/libraries/log/nrf_log_backend_interface.h:225
    #7  nrf_log_frontend_dequeue ()
        at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:862
    #8  0x00027fe6 in nrf_log_frontend_dequeue ()
        at ../../../../../../components/libraries/log/src/nrf_log_frontend.c:713
    #9  0x00030db0 in on_idle () at ../../../main.c:183
    #10 _uart_send_string (data=0x37bd0 "1099\n", len=5, mutex_uart=true)
        at ../../../main.c:726
    #11 on_keepalive_timer_evt () at ../../../main.c:570
    #12 0x000287e6 in timer_expire (p_timer=0x20009030 <m_keepalive_timer_id_data>)
        at ../../../../../../components/libraries/timer/app_timer2.c:184
    #13 on_compare_evt (p_instance=0x20002c0c <m_rtc_inst>)
        at ../../../../../../components/libraries/timer/app_timer2.c:302
    #14 rtc_irq (p_instance=0x20002c0c <m_rtc_inst>)
        at ../../../../../../components/libraries/timer/app_timer2.c:465
    #15 <signal handler called>
    #16 0x000296bc in app_usbd_event_queue_process ()
        at ../../../../../../components/libraries/usbd/app_usbd.c:1336
    #17 0x00031c0c in on_idle () at ../../../main.c:180
    #18 main () at ../../../main.c:1662

    > Do you check the return value for app_usbd_cdc_acm_write()? Does it always return 0, or does it return something else?

    I check but don't see anything suspicious. Occasional NRF_ERROR_BUSY's, but AFAIK they're normal to happen every now and then.

    static uint8_t uart_put(uint8_t byte) {
      return app_usbd_cdc_acm_write(&m_app_cdc_acm2, &byte, 1);
    }
    [..]
          ret_code = uart_put(data[i]);
          if ((ret_code != NRF_SUCCESS) && (ret_code != NRF_ERROR_BUSY))
          //if (ret_code != NRF_SUCCESS)
          {
            NRF_LOG_WARNING("Write to UART failed for reason %d", ret_code);
          }
        } while (ret_code == NRF_ERROR_BUSY);
    [..]

    > What tool do you use when you see the SIGSEGV? Are you using an IDE, or a standalone debugger tool?

    As debugger I'm using a multiarch GDB (software) and a black magic probe (hardware), attached via SWD.

    > I suspect that this is a hardfault. It may be caused by bad pointer/array handling. Is there something specific you add when this starts to show?

    Nothing I'm aware of.

    I was considering this being a corrupted stack, however - plainly telling from experience - it looks pretty sane to me (apart from the recursion or whatever is causing this seemingly endless call path). Also, I always end up in this, while I'd suspect otherwise if the issue is a corrupted pointer.

    > Does it also happen if you test the usbd_cdc_acm example?

    I'll try to make it a minimal test case then based on above example.

    Please note, that for everything above I'm using the UART backend for NRF_LOG.

    Relevant parts:

    static void log_init(void)
    {
      ret_code_t err_code = NRF_LOG_INIT(NULL);
      APP_ERROR_CHECK(err_code);
    
      NRF_LOG_DEFAULT_BACKENDS_INIT();
    }

    #define NRF_LOG_ENABLED 1
    #define NRF_LOG_BACKEND_UART_ENABLED 1
    #define NRF_LOG_BACKEND_UART_TX_PIN 24
    #define NRF_LOG_BACKEND_UART_BAUDRATE 251658240
    #define NRF_LOG_BACKEND_RTT_ENABLED 0
    #define NRF_LOG_DEFERRED 0
    #define NRF_LOG_USES_TIMESTAMP 0
    #define NRF_LOG_DEFAULT_LEVEL 4

    Is there maybe something obvious I'm doing here prone to trigger such behaviour?

Children
  • Do you have several calls to NRF_LOG_PROCESS()? Or just one occurance? If you have more, can you try to use only one? This is known as a possible cause of a hardfault.

    The logger is a bit limited. If the issue is that you log too much. Do you use FreeRTOS as well?

  • > Do you have several calls to NRF_LOG_PROCESS()? Or just one occurance?

    What do you mean by that, "one occurance"? I'm calling NRF_LOG_PROCESS() frequently, as part of a `on_idle()` function which is called within the mainloop and every now and then elsewhere, where I know tasks might take a bit longer to succeed.

    > Do you use FreeRTOS as well?

    I'm using the nRF5 SDK v16 as it is. As far as I know the nRF5 SDK doesn't use FreeRTOS but runs on the bare metal(?)

  • daten said:
    As far as I know the nRF5 SDK doesn't use FreeRTOS but runs on the bare metal(?)

     Yes, sort of. The examples are event driven. There are some freertos examples in the SDK, such as ble_app_uart_freertos, but if you haven't started with any of the freertos examples, or actively added it, then you are not using freertos.

     

    daten said:
    What do you mean by that, "one occurance"? I'm calling NRF_LOG_PROCESS() frequently, as part of a `on_idle()` function which is called within the mainloop and every now and then elsewhere, where I know tasks might take a bit longer to succeed.

     Ok,  so there are several calls to NRF_LOG_PROCESS(), not only in your main-loop, within on_idle().

    You really shouldn't do that. If a call to NRF_LOG_PROCESS() risks being interrupted by another call to NRF_LOG_PROCESS(), you will probably get a hardfault. 

    I guess you added these other NRF_LOG_PROCESS() calls because you were missing some log messages. Perhaps you can try to remove these, to see if the hardfaults stops. And maybe a better approach would be to split up the logging into several small modules,  which you can turn off and on, like it is done in many modules in sdk_config.h, such as:

    #define GPIOTE_CONFIG_LOG_ENABLED 0
    #define LPCOMP_CONFIG_LOG_ENABLED 0
    #define NRFX_USBD_CONFIG_LOG_ENABLED 0
    #define PDM_CONFIG_LOG_ENABLED 0
    ...

    BR,

    Edvin

  • Removing the NRF_LOG_PROCESS() alone from everywhere but the mainloop improved the situation when logging is deferred, however I'm still running this very backtrace.

    When disabling logging completely, though, I haven't seen any crash so far.

    For NRF_LOG_PROCESS() I would expected a warning not using it outside of the mainloop.

    "You can call this macro from the main context or from the error handler to process log entries one by one." doesn't cut it.

    Besides, NRF_LOG_PROCESS() - according to the docs - is a NOOP when deferred logging is disabled - and I was/am having it disabled.

    The overall state here I have to say is.. disappointing. Logging is an essential tool for debugging. If logging of a framework/SDK itself is so fragile and causing these issues, eventually even hiding the issues you're trying to debug, its use is to be discouraged - this is every developers' nightmare.

Related