Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nrf_log_frontend_dequeue must be atomically protected against re-entry from interrupt context

This issue exists in NRF SDK5 15.2.0 and earlier.  NRF log module has entered "production" level as of SDK5 15.2.

While implementing an additional logging backend, I discovered a race condition in the nrf_log_frontend_dequeue function.

The bug can occur when logging is DEFERRED, and logs using nrf_log_push are generated from an interrupt.

The root issue is that nrf_log_frontend_dequeue which processes an internal buffer is normally called from the idle loop, but can also be called from an interrupt after nrf_log_push sets the autoflush flag.

To fix this issue I recommend adding an atomic busy flag to nrf_log_frontend_dequeue to avoid re-entrant calls from proceeding.

  • I have updated the internal jira with your suggestions. Hopefully it the meantime others may find his case and your suggestion.

    Best regards,
    Kenneth

  • Good find.  I have the same issue, and following you suggestion have implemented the following, which if you have time to review I would be most grateful.

    typedef struct
    {
        uint32_t                  wr_idx;          // Current write index (never reset)
        uint32_t                  rd_idx;          // Current read index  (never_reset)
        uint32_t                  mask;            // Size of buffer (must be power of 2) presented as mask
        uint32_t                  buffer[NRF_LOG_BUF_WORDS];
        nrf_log_timestamp_func_t  timestamp_func;  // A pointer to function that returns timestamp
        nrf_log_backend_t const * p_backend_head;
        nrf_atomic_flag_t         log_skipping;
        nrf_atomic_flag_t         log_skipped;
        nrf_atomic_flag_t         log_is_busy;     // This flag replaces the blocking use of critical region
        nrf_atomic_u32_t          log_dropped_cnt;
        bool                      autoflush;
    } log_data_t;

    ret_code_t nrf_log_init(nrf_log_timestamp_func_t timestamp_func, uint32_t timestamp_freq)
    {
        (void)NRF_LOG_ITEM_DATA_CONST(app);
    
        if (NRF_LOG_USES_TIMESTAMP && (timestamp_func == NULL))
        {
            return NRF_ERROR_INVALID_PARAM;
        }
    
        m_log_data.mask         = NRF_LOG_BUF_WORDS - 1;
        m_log_data.wr_idx       = 0;
        m_log_data.rd_idx       = 0;
        m_log_data.log_skipped  = 0;
        m_log_data.log_skipping = 0;
        m_log_data.log_is_busy  = 0;
        m_log_data.autoflush    = NRF_LOG_DEFERRED ? false : true;

    bool nrf_log_frontend_dequeue(void)
    {
        if (buffer_is_empty())
        {
            return false;
        }
        // Note also add atomic flag set before this __DSB() and after this function exits
        // See https://devzone.nordicsemi.com/f/nordic-q-a/39188/nrf_log_frontend_dequeue-must-be-atomically-protected-against-re-entry-from-interrupt-context
        if (nrf_atomic_flag_set_fetch(&m_log_data.log_is_busy))
        {
            return NRF_ERROR_BUSY;
        }
    ...
        // Skip any in progress packets.
        do {
            if (invalid_packets_omit(p_header, &rd_idx) && (m_log_data.log_skipped == 0))
            {
                //Check if end of data is not reached.
                if (rd_idx >= m_log_data.wr_idx)
                {
                    m_log_data.rd_idx     = m_log_data.wr_idx;
                    // Ensure that next invocation occurs after this flag is cleared.
                    m_log_data.log_is_busy = 0;
                    return false;
                }
    ...
        // Ensure that next invocation occurs after this flag is cleared.
        m_log_data.log_is_busy = 0;
        return buffer_is_empty() ? false : true;
    }
    

  • the return type for nrf_log_frontend_dequeue is bool so in your busy check should look like this:

    if (nrf_atomic_flag_set_fetch(&m_log_data.log_is_busy)==1)
    {
    return false;
    }

    also you should use the atomic library to clear the variable:

        UNUSED_RETURN_VALUE(nrf_atomic_flag_clear(&m_log_data.log_is_busy));

Related