I am developing a fairly complex app which involved having a lot of logging using the NRF_LOG macros with the UART backend. I am using the nRF52 DK as my platform with nRF SDK version 17.0.2 and Softdevice S132 with SES for development.
My app is as well using a couple of app_timers to schedule certain activities e.g. once a second, or at other intervals.
So far, "most of the time" everything works fine but in recent times I experience more and more crashes of the app in so far, that it simply freezes, hard faults and I was also lucky at one point getting an NRF_ERROR_BUSY printed as an error cause on the UART (similar to https://devzone.nordicsemi.com/f/nordic-q-a/72055/crash-in-nrf_log-uart-backend-nrf_error_busy). I am NOT using the deferred logging mode though.
Sometimes, the app is already crashing in the very first stages after bootup when the PM is deleting bonds of a buggy CSCS censor. Resetting the nRF52 DK then makes everything fine again.
It's hard to say what the root cause for the crash is, as it does result in a hard fault after some Pause/Resume clicks in the IDE (when connected locally) but I was not yet able to actually figure out the code location which hard faults using the known best practices with intercepting $SP+14 etc. Sometimes it does not seem to hard fault at all or maybe just as a consequence from halting the SD. It's all very foggy and my usual stuff of implementing a custom app_error handle do not seem to get triggered when the problem occurs.
Searching for other posts related to this I read about problems in NRF_LOG in past versions of the SDK (e.g. https://devzone.nordicsemi.com/f/nordic-q-a/29103/nrf_log-fixes-in-sdk14-1-0) and wonder if there are still some known issues left with respect to NRF_LOG which might lead to the observed behavior.
Assuming the root cause to lie in NRF_LOG code is just a wild guess, also based on the NRF_ERROR_BUSY error mentioned before and that freezes always seems to occur in-midst of printing out a log line on UART (saying only part of the log message is already printed when the app freezes).
The freezes do not only occur when I call NRF_LOG in my code but are also happening in-midst of NRF_LOG calls within SDK code.
I already had a quick look at the code to check if I do NRF_LOG when within an app_timer handler (interrupt context) because I presume this could cause a problem when a Non-IRQ started NRF_LOG gets interrupted by the higher priority NRF_LOG from within the IRQ handler.
Can you imagine any potential constellations that could lead to freezes/crashes when using NRF_LOG that I could check or should be aware of to avoid them?