This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Using LIBUARTE during flash operations

Hi, 

We are using a NRF52840 chip (SDK 15.3) in an application where we use the softdevice (S140) in combination with 2 UARTS: one for sending out debug output (UART0 - nrfx_uart), and another one for communication with another MCU.

During (internal) flash writes when doing bonding or garbage collection, the CPU stalls for quite a long time (~85 ms during flash erase). In order to make sure we do not lose data from the other CPU we have been looking into the 'nrf_libuarte_async' functionality.

It is constructed by using:

NRF_LIBUARTE_ASYNC_DEFINE(libuarte, 1, 1, NRF_LIBUARTE_PERIPHERAL_NOT_USED, 2, 512, 3);

(UARTE1, TIMER1 for byte counting, TIMER2 for timeout, buffer size of 512).

During normal operation it works fine, but after the CPU stall, we ofter end it in an error in 'nrf_libuarte_async_rx_free', stating "Unexpected RX free input parameter.". In this case the 'rx_free_cnt' is bigger than the 'chunck_size'. We are following the example of examples\peripheral\experimental_libuarte and free the buffer in the NRF_LIBUARTE_ASYNC_EVT_RX_DATA event.

Could you give any indication what can be the issue here? We couldn't find much documentation wrt to this driver. 

Thank you.

Parents
  • Hi

    What baudrate is the UART1 running at?

    How large is the UART utilization from the connected MCU? Do you think there is a risk that the UART RX buffer could fill up during those ~85 milliseconds?

    Are you using hardware flow control or not?

    Best regards
    Torbjørn

  • Response on behalf of my collegue: The baudrate is 115200. There is no risk that the UART Rx buffer could fill up. The connected MCU can only send one single message (which is less <400 bytes),then it waits for a response. Unfortunately there is no possibility that to use hardware flow control

  • Let me start with answering some of your previous questions:

    Q:To summarize, you confirm that this can happen with or without flash erase running?
    A:This happens without flash erase.

    Q:You confirm that this will only happen when using the libuarte_async library?
    A: Yes. When using the nrfx_uarte driver directly, we did not have this issue.

    Q:How do you process UART events in the application? Are you running a lot of event processing in the events directly, or do you process it later in main/thread context?
    A: Data is put in a buffer, and processed in the main thread. No processing is handled in IRQ context. Same for BLE events: everything is queued for processing outside of interrupt context.

    Q: Do you know what interrupt priority the UART events are returned in
    A: The "irq_prio" is filled in with 5

    The corruption only happens on the receive action. I looks like the problem occurs in case I receive new data during the timeout. What I see only the logic analyzer is that we often have a UART sequence looking llike the following

    <~200 bytes on UART Rx>  <no communication for ~5 ms>  <~200 bytes on UART Rx> <no communication for ~X ms)

    Initialily, I had the timeout set to 5 ms. I noticed the error very often. It looked like the data from the first and second burst where mixed.

    Once I set the timeout to 15 ms, I see it very rarely: actually only when my "no communication" window is equal to X.

    Now I reduced it to 2 ms, it works without issues (in the 15 minutes I tried this), probably because this "no communication" window is very unlikely to happen.

  • Hi Roy

    RoyCreemers said:
    A: The "irq_prio" is filled in with 5

    That's a bit odd. Legal values when using the SoftDevice are 2, 3, 6 and 7. 

    Defines for setting IRQ priority can be found in app_util_platform.h in the SDK:

    #define _PRIO_SD_HIGH 0
    #define _PRIO_SD_MID 1
    #define _PRIO_APP_HIGH 2
    #define _PRIO_APP_MID 3
    #define _PRIO_SD_LOW 4
    #define _PRIO_SD_LOWEST 5
    #define _PRIO_APP_LOW 6
    #define _PRIO_APP_LOWEST 7
    #define _PRIO_THREAD 15

    If the problem seems to be solved when the timeout is shorter than the time between bursts I expect the issue to be triggered when the DMA buffer fills up in the middle of a burst, but the hardware should handle this if you set up two buffers and configure the DMA to switch to the second buffer automatically once the first fills up. 

    I can double check with the designers if the driver is set up to do this. 

    Have you done more testing at 2ms to see if this setting seems reliable?

    Best regards
    Torbjørn

  • When setting priority to _PRIO_APP_LOW, the issue is still there. I did more testing with the 2 ms, and that seems to be reliable in our test test setup - problem is that there might be sequences where the second burst is equal to that timout as well.

    Please double check your hypothesis with the designers.

  • Hi

    As it seems the developer did not agree with my hypothesis. The driver does use the double buffering technique, and they have tested it with interrupts of up to 100ms to make sure it can handle long periods of 'stalls' without losing data. 

    He asked me if you could try to enable logging for this module, and see if any errors are returned during operation?

    Can you give some more information about the data corruption you see?
    You mentioned the data appears to be mixed, do you mean that every other byte is from the first burst or the second?

    If the problem persists I think I will have to setup a small test application here to try to reproduce the issue. 

    Best regards
    Torbjørn

  • Hi again

    The developer discovered a potential race condition in the code that might be related to your issue. 

    He implemented a possible fix and forwarded it to me so you can test it:
    nrf_libuarte_async.c

    Could you please try this implementation out with the timeout set to 5ms and see if the problem is still there?

    Best regards
    Torbjørn

Reply Children
Related