This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

freeRTOS + nrf_logging module thread-safe?

Hi!

I have an issue with log-entrys disappearing seemingly at random. Not super frequent and possibly (but not confirmed) more often towards the beginning of a freshly started thread.

I'm using SDK 15.2.0 with soft-device s132 v6.0.0

A stripped down version of my code:

void vApplicationIdleHook( void )
{
         vTaskResume(m_logger_task);
}

static void logger_task(void * arg)
{
    UNUSED_PARAMETER(arg);

    while (1)
    {
        NRF_LOG_FLUSH();

        vTaskSuspend(NULL); // Suspend myself
    }
}

static void some_task(void * arg)
{
   NRF_LOG_INFO("some_task 1");
   ...
   do_stuff();
   ...
   NRF_LOG_INFO("some_task 2");
   while(1)
   {
      do_more_task_stuff()
   }
}


int main(void)
{
    uint32_t err_code = NRF_LOG_INIT(NULL);
    NRF_LOG_DEFAULT_BACKENDS_INIT();
    ...
    NRF_LOG_INFO("Main 1");
    ...
    xTaskCreate(logger_task, "LOGGER", 1024, NULL, 1, &m_logger_task)
    xTaskCreate(some_task, "SOME_TASK", 1024, NULL, 1, &m_some_task)
    ...
    ...  //Multiple other tasks are also started and they also log
    xTaskCreate(other_task, "OTHER_TASK", 1024, NULL, 1, &m_other_task)


    vTaskStartScheduler();
    for (;;)
    {
        APP_ERROR_HANDLER(NRF_ERROR_FORBIDDEN);
    }
}


I would expect to get
"Main 1"
"some_task 1"
"some_task 2"

But sometimes I just get
"Main 1"
"some_task 2"
Just like the first prints from "some_tasks" get eaten. However the logs from "Main" never seem to get lost.

So I did some digging into the log-code. It seems as the NRF_LOG_DEBUG() macro trough a series of expansions ends up at
nrf_log_frontend.c:536 std_n() function
static inline void std_n(uint32_t           severity_mid,
                         char const * const p_str,
                         uint32_t const *   args,
                         uint32_t           nargs)
{
    uint32_t mask   = m_log_data.mask;
    uint32_t wr_idx;

    if (buf_prealloc(nargs, &wr_idx, true))
    {
        // Proceed only if buffer was successfully preallocated.

        uint32_t data_idx = wr_idx + HEADER_SIZE;
        uint32_t i;
        for (i = 0; i < nargs; i++)
        {
            m_log_data.buffer[data_idx++ & mask] =args[i];
        }
        std_header_set(severity_mid, p_str, nargs, wr_idx, mask);
    }
    if (m_log_data.autoflush)
    {
        NRF_LOG_FLUSH();
    }

}

That function seems to directly manipulate the global singleton m_log_data object and I cannot se any mutexes or other locking mechanisms.
Cannot se any locking in the 
   nrf_log_frontend_dequeue()
function either (seems to be where the logging threads macro expansion of "NRF_LOG_FLUSH()" ends up).

So my questions are:

1. Am I missing something or are the logging functions simply not thread safe? The documentation specifically says that non-defered mode is not thread-safe. But is says nothing about defered mode (which is what I'm using). 

2. Could it be something else that I'm doing wrong? Am I the only one with this issue?

Parents
  • 1. Am I missing something or are the logging functions simply not thread safe? The documentation specifically says that non-defered mode is not thread-safe. But is says nothing about defered mode (which is what I'm using). 

     The deffered mode should be thread safe. I have been using the logger module on many RTOS much with much more complex inter task communication and dependencies that the one you showed. And I did not see any race conditions like you suspect. By the way, which backend are you using, UART or RTT? Also what kind of buffer sizes are defined in your sdk_config.h regarding the logging?

     

    2. Could it be something else that I'm doing wrong? Am I the only one with this issue?

     You probably are not the only one, if you managed to trigger this behavior with such simple tasks. But I am curious what your do_stuff() does. 

  • Thanks for your reply,

    We use RTT backend.
    #define NRF_LOG_ALLOW_OVERFLOW 1
    #define NRF_LOG_BUFSIZE 1024

    I have tried to reduce the bufzise and then, as expected, i get the overflow makers.
    However when running 1024 I never get overflow markers (expeded as I'm not printing that much), but the messages still are dropped silently *without markers*. So I don't think there is any overflow condirtion thats causing my issues.

     You probably are not the only one, if you managed to trigger this behavior with such simple tasks. But I am curious what your do_stuff() does. 

    Of course. Unfortunately it is a somewhat large ble-app running on a custom board that is inpractical to post here in its entirety.
    I have the configCHECK_FOR_STACK_OVERFLOW enabled as well with no hits so its not any obvious stack corruptioin.
    The app itself also works seemingly stable, its just the prints thats doesnt works

    I was hoping it would be a known issue with at standardized solution :)
    Now that is seems to be complicated I understand that you need to reproduce the issue. If I could reproduce the issue with minor changes to one of the demo-apps running on a demo board, would that be something you could take a look at?

Reply
  • Thanks for your reply,

    We use RTT backend.
    #define NRF_LOG_ALLOW_OVERFLOW 1
    #define NRF_LOG_BUFSIZE 1024

    I have tried to reduce the bufzise and then, as expected, i get the overflow makers.
    However when running 1024 I never get overflow markers (expeded as I'm not printing that much), but the messages still are dropped silently *without markers*. So I don't think there is any overflow condirtion thats causing my issues.

     You probably are not the only one, if you managed to trigger this behavior with such simple tasks. But I am curious what your do_stuff() does. 

    Of course. Unfortunately it is a somewhat large ble-app running on a custom board that is inpractical to post here in its entirety.
    I have the configCHECK_FOR_STACK_OVERFLOW enabled as well with no hits so its not any obvious stack corruptioin.
    The app itself also works seemingly stable, its just the prints thats doesnt works

    I was hoping it would be a known issue with at standardized solution :)
    Now that is seems to be complicated I understand that you need to reproduce the issue. If I could reproduce the issue with minor changes to one of the demo-apps running on a demo board, would that be something you could take a look at?

Children
  • I would definitely look into this if you manage you share a small demo app for me to be able to reproduce the issue.

  • Hello again,
    Sorry for my late reply. I never managed to get time to reproduce the issue in a clean environment, but I think I managed to solve it which probably is better anyway Slight smile

    My problem was two-fold.

    1)
    If the RTT buffer would become full (because RTTClient on my PC does not read it fast enough) the RTT backend would have a default timeout of effectively 3ms with default parameters. Then the particular write would be silently dropped.
    It would not drop entire lines either, just individual writes as by the nrf_log_backend_serial_put().
    These partially missing lines had me thinking about stack-corruption for the longest time, but that was not the issue.

    The silent dropping can be seen in: nrf_log_backend_rtt.c: serial_tx().
    I could verify that this indeed was what was happening by putting a breakpoint at
    nrf_log_backend_rtt.c:93 (sdk 15.2.0): m_host_present = false;

    in sdk_config.h I increased the effective timeout from 3ms to 2seconds which seems to remove the issue completely.

    #define NRF_LOG_BACKEND_RTT_TX_RETRY_DELAY_MS 100 //was 1
    #define NRF_LOG_BACKEND_RTT_TX_RETRY_CNT 20 // was 3 (12 seems ok, use 20 for some margin)


    2)
    There is a race-condition between writes to log_buffer and backend-reads

    • NRF_LOG_XXX() expands to std_n() in the end.
      • nrf_log_frontend.c:536 std_n()
        • nrf_log_frontend.c:446 buf_prealloc()
          • CRITICAL_REGION_ENTER()
          • Alloc header+data
          • Init hdr.base ()
          • wr_idx += req_len
          • hdr->base.std.in_progress = 1
          • CRITICAL_REGION_EXIT()
          • // If NRF_LOG_PROCESS thread wakes up here, while in_progress==1, then the entire line will be dropped se below
          • nrf_log_frontend.c:408: std_header_set()
            hdr->base.std.in_progress = 0
    • NRF_LOG_PROCESS() expands to:
      • nrf_log_frontend_dequeue() (=NRF_LOG_PROCESS())
        • invalid_packets_omit()
          • if( in_progress == 1) skip_over()
        • ... 


    This issue with NRF_LOG_XXX being interrupted (while still in progress) by NRF_LOG_PROCESS cannot happen if the logger-thread executing NRF_LOG_PROCESS runs at a lower priority than all threads issuing NRF_LOG_XXX.

    The documentation is very vague about this issue.
    Almost at the very bottom here: https://infocenter.nordicsemi.com/topic/sdk_nrf5_v16.0.0/lib_nrf_log.html

    the doc says this rather cryptic warning:
    "In the current implementation, each and every logger backend must run on lower or equal priority as the module that is writing logs."

    Which is cryptic in may ways:
    a) The logger backends are not separate threads. There is a single thread issuing NRF_LOG_PROCESS() which in turn callbacks all registred backends.
    b) "module" is used instead of "thread". I dont understand what module has to do with anything here?

    So in the end, if I may make two feature-requests:

    F1: It would be nice if the RTT Backend did not just silently drop writes, and instead inserted some little string like "RTT-dropped N chars" analogous to how the log-frontens inserts "dropped N lines" in case of buffer-overflow.

    F2: I suggest a clarification in the docs

    1. Clarify the cryptic warning at the bottom here: https://infocenter.nordicsemi.com/topic/sdk_nrf5_v16.0.0/lib_nrf_log.html 
    2. Mention this warning in the chapter about defered logprocessing:
      https://infocenter.nordicsemi.com/topic/sdk_nrf5_v16.0.0/lib_nrf_log.html?cp=7_1_3_26#lib_nrf_log_deferred
      As I noted in my first question the doc about defered processing does not mention thread-safety at all. My suggestion would be to add something like:
      "Defered log-procsessing is thread-safe if and only if the threads using NRF_LOG_XXX() have higher priority than the thread executing NRF_LOG_PROCESS()"

    Best Regards
    Axel

Related