Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nrf_log_frontend_dequeue must be atomically protected against re-entry from interrupt context

This issue exists in NRF SDK5 15.2.0 and earlier.  NRF log module has entered "production" level as of SDK5 15.2.

While implementing an additional logging backend, I discovered a race condition in the nrf_log_frontend_dequeue function.

The bug can occur when logging is DEFERRED, and logs using nrf_log_push are generated from an interrupt.

The root issue is that nrf_log_frontend_dequeue which processes an internal buffer is normally called from the idle loop, but can also be called from an interrupt after nrf_log_push sets the autoflush flag.

To fix this issue I recommend adding an atomic busy flag to nrf_log_frontend_dequeue to avoid re-entrant calls from proceeding.

Parents
  • I noticed what appears as an attempt at fixing this issue in the nrf_log_frontend.c.  The change in recent SDK adds a CRITICAL_REGION around the NRF_LOG_FLUSH() called by autoflush.  This may address the issue however it will block interrupts for an extended time while waiting for logs to transfer. 

    I recommend not using this workaround.  Instead, I recommend adding an atomic flag set_fetch before the DSB call in nrf_log_frontend_dequeue() and atomic flag clear before returning.  This change would allow ISR's to continue normal operation.

    I have been using this fix for over a year and it has not shown any issues. 

  • I have updated the internal jira with your suggestions. Hopefully it the meantime others may find his case and your suggestion.

    Best regards,
    Kenneth

Reply Children
No Data
Related