Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nrf_log_frontend_dequeue must be atomically protected against re-entry from interrupt context

This issue exists in NRF SDK5 15.2.0 and earlier.  NRF log module has entered "production" level as of SDK5 15.2.

While implementing an additional logging backend, I discovered a race condition in the nrf_log_frontend_dequeue function.

The bug can occur when logging is DEFERRED, and logs using nrf_log_push are generated from an interrupt.

The root issue is that nrf_log_frontend_dequeue which processes an internal buffer is normally called from the idle loop, but can also be called from an interrupt after nrf_log_push sets the autoflush flag.

To fix this issue I recommend adding an atomic busy flag to nrf_log_frontend_dequeue to avoid re-entrant calls from proceeding.

Parents
  • I noticed what appears as an attempt at fixing this issue in the nrf_log_frontend.c.  The change in recent SDK adds a CRITICAL_REGION around the NRF_LOG_FLUSH() called by autoflush.  This may address the issue however it will block interrupts for an extended time while waiting for logs to transfer. 

    I recommend not using this workaround.  Instead, I recommend adding an atomic flag set_fetch before the DSB call in nrf_log_frontend_dequeue() and atomic flag clear before returning.  This change would allow ISR's to continue normal operation.

    I have been using this fix for over a year and it has not shown any issues. 

  • Good find.  I have the same issue, and following you suggestion have implemented the following, which if you have time to review I would be most grateful.

    typedef struct
    {
        uint32_t                  wr_idx;          // Current write index (never reset)
        uint32_t                  rd_idx;          // Current read index  (never_reset)
        uint32_t                  mask;            // Size of buffer (must be power of 2) presented as mask
        uint32_t                  buffer[NRF_LOG_BUF_WORDS];
        nrf_log_timestamp_func_t  timestamp_func;  // A pointer to function that returns timestamp
        nrf_log_backend_t const * p_backend_head;
        nrf_atomic_flag_t         log_skipping;
        nrf_atomic_flag_t         log_skipped;
        nrf_atomic_flag_t         log_is_busy;     // This flag replaces the blocking use of critical region
        nrf_atomic_u32_t          log_dropped_cnt;
        bool                      autoflush;
    } log_data_t;

    ret_code_t nrf_log_init(nrf_log_timestamp_func_t timestamp_func, uint32_t timestamp_freq)
    {
        (void)NRF_LOG_ITEM_DATA_CONST(app);
    
        if (NRF_LOG_USES_TIMESTAMP && (timestamp_func == NULL))
        {
            return NRF_ERROR_INVALID_PARAM;
        }
    
        m_log_data.mask         = NRF_LOG_BUF_WORDS - 1;
        m_log_data.wr_idx       = 0;
        m_log_data.rd_idx       = 0;
        m_log_data.log_skipped  = 0;
        m_log_data.log_skipping = 0;
        m_log_data.log_is_busy  = 0;
        m_log_data.autoflush    = NRF_LOG_DEFERRED ? false : true;

    bool nrf_log_frontend_dequeue(void)
    {
        if (buffer_is_empty())
        {
            return false;
        }
        // Note also add atomic flag set before this __DSB() and after this function exits
        // See https://devzone.nordicsemi.com/f/nordic-q-a/39188/nrf_log_frontend_dequeue-must-be-atomically-protected-against-re-entry-from-interrupt-context
        if (nrf_atomic_flag_set_fetch(&m_log_data.log_is_busy))
        {
            return NRF_ERROR_BUSY;
        }
    ...
        // Skip any in progress packets.
        do {
            if (invalid_packets_omit(p_header, &rd_idx) && (m_log_data.log_skipped == 0))
            {
                //Check if end of data is not reached.
                if (rd_idx >= m_log_data.wr_idx)
                {
                    m_log_data.rd_idx     = m_log_data.wr_idx;
                    // Ensure that next invocation occurs after this flag is cleared.
                    m_log_data.log_is_busy = 0;
                    return false;
                }
    ...
        // Ensure that next invocation occurs after this flag is cleared.
        m_log_data.log_is_busy = 0;
        return buffer_is_empty() ? false : true;
    }
    

Reply
  • Good find.  I have the same issue, and following you suggestion have implemented the following, which if you have time to review I would be most grateful.

    typedef struct
    {
        uint32_t                  wr_idx;          // Current write index (never reset)
        uint32_t                  rd_idx;          // Current read index  (never_reset)
        uint32_t                  mask;            // Size of buffer (must be power of 2) presented as mask
        uint32_t                  buffer[NRF_LOG_BUF_WORDS];
        nrf_log_timestamp_func_t  timestamp_func;  // A pointer to function that returns timestamp
        nrf_log_backend_t const * p_backend_head;
        nrf_atomic_flag_t         log_skipping;
        nrf_atomic_flag_t         log_skipped;
        nrf_atomic_flag_t         log_is_busy;     // This flag replaces the blocking use of critical region
        nrf_atomic_u32_t          log_dropped_cnt;
        bool                      autoflush;
    } log_data_t;

    ret_code_t nrf_log_init(nrf_log_timestamp_func_t timestamp_func, uint32_t timestamp_freq)
    {
        (void)NRF_LOG_ITEM_DATA_CONST(app);
    
        if (NRF_LOG_USES_TIMESTAMP && (timestamp_func == NULL))
        {
            return NRF_ERROR_INVALID_PARAM;
        }
    
        m_log_data.mask         = NRF_LOG_BUF_WORDS - 1;
        m_log_data.wr_idx       = 0;
        m_log_data.rd_idx       = 0;
        m_log_data.log_skipped  = 0;
        m_log_data.log_skipping = 0;
        m_log_data.log_is_busy  = 0;
        m_log_data.autoflush    = NRF_LOG_DEFERRED ? false : true;

    bool nrf_log_frontend_dequeue(void)
    {
        if (buffer_is_empty())
        {
            return false;
        }
        // Note also add atomic flag set before this __DSB() and after this function exits
        // See https://devzone.nordicsemi.com/f/nordic-q-a/39188/nrf_log_frontend_dequeue-must-be-atomically-protected-against-re-entry-from-interrupt-context
        if (nrf_atomic_flag_set_fetch(&m_log_data.log_is_busy))
        {
            return NRF_ERROR_BUSY;
        }
    ...
        // Skip any in progress packets.
        do {
            if (invalid_packets_omit(p_header, &rd_idx) && (m_log_data.log_skipped == 0))
            {
                //Check if end of data is not reached.
                if (rd_idx >= m_log_data.wr_idx)
                {
                    m_log_data.rd_idx     = m_log_data.wr_idx;
                    // Ensure that next invocation occurs after this flag is cleared.
                    m_log_data.log_is_busy = 0;
                    return false;
                }
    ...
        // Ensure that next invocation occurs after this flag is cleared.
        m_log_data.log_is_busy = 0;
        return buffer_is_empty() ? false : true;
    }
    

Children
Related