This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

App crashes under heavy load when using NRF_LOG and UART

I am developing a fairly complex app which involved having a lot of logging using the NRF_LOG macros with the UART backend. I am using the nRF52 DK as my platform with nRF SDK version 17.0.2 and Softdevice S132 with SES for development. 

My app is as well using a couple of app_timers to schedule certain activities e.g. once a second, or at other intervals.

So far, "most of the time" everything works fine but in recent times I experience more and more crashes of the app in so far, that it simply freezes, hard faults and I was also lucky at one point getting an NRF_ERROR_BUSY printed as an error cause on the UART (similar to https://devzone.nordicsemi.com/f/nordic-q-a/72055/crash-in-nrf_log-uart-backend-nrf_error_busy). I am NOT using the deferred logging mode though.

Sometimes, the app is already crashing in the very first stages after bootup when the PM is deleting bonds of a buggy CSCS censor. Resetting the nRF52 DK then makes everything fine again. 

It's hard to say what the root cause for the crash is, as it does result in a hard fault after some Pause/Resume clicks in the IDE (when connected locally) but I was not yet able to actually figure out the code location which hard faults using the known best practices with intercepting $SP+14 etc. Sometimes it does not seem to hard fault at all or maybe just as a consequence from halting the SD. It's all very foggy and my usual stuff of implementing a custom app_error handle do not seem to get triggered when the problem occurs.

Searching for other posts related to this I read about problems in NRF_LOG in past versions of the SDK (e.g. https://devzone.nordicsemi.com/f/nordic-q-a/29103/nrf_log-fixes-in-sdk14-1-0) and wonder if there are still some known issues left with respect to NRF_LOG which might lead to the observed behavior.

Assuming the root cause to lie in NRF_LOG code is just a wild guess, also based on the NRF_ERROR_BUSY error mentioned before and that freezes always seems to occur in-midst of printing out a log line on UART (saying only part of the log message is already printed when the app freezes).

The freezes do not only occur when I call NRF_LOG in my code but are also happening in-midst of NRF_LOG calls within SDK code.

I already had a quick look at the code to check if I do NRF_LOG when within an app_timer handler (interrupt context) because I presume this could cause a problem when a Non-IRQ started NRF_LOG gets interrupted by the higher priority NRF_LOG from within the IRQ handler.

Can you imagine any potential constellations that could lead to freezes/crashes when using NRF_LOG that I could check or should be aware of to avoid them? 

  • Hi

    Is there any particular reason you are not using deferred logging?

    This sounds like the kind of situation that deferred logging is designed to handle, where you have a lot of logging activity at different interrupt levels. 

    Also, what UART baudrate are you using?
    If you increase the baudrate you also increase LOG throughput, and the logging calls are less likely to be slowed down by having to wait for the UART buffers to clear. 

    Best regards
    Torbjørn

  • I had the assumption that when using deferred logging I am not able to use dynamic buffers, e.g. created via char[] buf and then sprintf'ed into them, due to the fact, that NRF_LOG_* only stores a pointer to the buf and that buf might go out of scope until finally it gets written using deferred logging. Is this not true? Doing an explicit flush after the call to log instead would render the deferred logging meaningless.

    Using a higher baudrate sounds sensible from the effects you mention, but isn't that in the end just shifting the problem and "hoping for the best"?

    Are you saying that what I observe is indeed a potentially expected behavior? When I worked last on the project and had logging turned out during dev it happened 4,5 times per hour that the app crashed. I was not always able to pinpoint the instruction where it happened in my app_error_handler, but when I was it was in nrf_log_backend.c or nrf_log_frontend.c so this seems to foster my assumption that the crash is indeed logging related?

  • Hi Markus

    daubsi said:
    I had the assumption that when using deferred logging I am not able to use dynamic buffers, e.g. created via char[] buf and then sprintf'ed into them, due to the fact, that NRF_LOG_* only stores a pointer to the buf and that buf might go out of scope until finally it gets written using deferred logging. Is this not true? Doing an explicit flush after the call to log instead would render the deferred logging meaningless.

    This is correct, but if you make the char[] buffer static it should work. Unless you keep reusing the same buffer over and over, then you would need to split this into separate static buffers. 

    daubsi said:
    Using a higher baudrate sounds sensible from the effects you mention, but isn't that in the end just shifting the problem and "hoping for the best"?

    I agree we should try to find a solution that only leads to loss of logging data when the UART buffers fill up, not system crashes.  

    Using a higher baudrate will reduce the occurrence of lost logging data in this scenario. 

    daubsi said:
    Are you saying that what I observe is indeed a potentially expected behavior? When I worked last on the project and had logging turned out during dev it happened 4,5 times per hour that the app crashed. I was not always able to pinpoint the instruction where it happened in my app_error_handler, but when I was it was in nrf_log_backend.c or nrf_log_frontend.c so this seems to foster my assumption that the crash is indeed logging related?

    If the UART buffers fill up you might be hanging in the interrupt for a long time, and you might even block the UART interrupts which mean you aren't able to process any more UART data while you are stuck in the interrupt. Potentially this could lead to a deadlock where you are stuck in the interrupt forever, unable to process any more Bluetooth events. Even if the SoftDevice itself can use a higher interrupt priority 

    So I would say that errors like this is to be expected if you are using non-deferred logging, and pushing the limits of the UART buffers. 

    Best regards
    Torbjørn

  • Hi Torbjørn,

    thanks for your reply. You suggestion match my expectation about how to cope with this. However I think juggling around with some static allocated buffers is maybe also not optimal. I will see what i can do about this. I think NRF_LOG_PUSH could also be used to work around this? But it would mean that i need to revisit every logline where I use dynamic logging....

    I am glad to hear, that you hint, that overloading the UART might indeed be the root cause for the hangs though tbh I am not sure why it would happen as the sending should be performed in a kind of FIFO style I presume? Anyway, i can ofc live with this answer and can also try to crank up the UART speed. I build a release and a debug variant of my app with 0 logging in release and in that version no crashes occured so far. Thank you so much from saving me from going insane by thinking it must be some kind of bug in my code triggering those faults ;-D

    Not sure if there will be future updates to the SDK as Connect SDK seems to be the future but from my point of view, I'd prefer lost or truncated logs over app freezes ;-)

    Best regards,

    Markus

Related