This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nRF51422 SDK11 app_uart_fifo race condition and fix

Yet another app_uart_fifo question. I'm using SDK 11 with uVision. It is my expectation that, if using HW flow control and a FIFO to RX, no received characters should be dropped at all, under any conditions.

I've noticed errors thrown (UART driver error, FIFO error) and characters lost in the following scenario:

  • nRF51DK board
  • serial configured in HW flow control mode, 115200 baud, connected to Segger chip (default config) on pins 11 (RX) 8 (RTS) 10 (TX) 9 (CTS). This would be the default setup, except a higher bitrate.
  • no softdevice used
  • using RX buffer of 32 bytes, but larger values only delay the issue.

The app flow is as follows:

  • from uart handler when data is available I'm setting a flag which is atomically checked from main
  • from main I'm reading the UART. When I have exactly 20 bytes, I'm printing them out (just for debugging, issue happens even if I don't print, just sleep from time to time). Then main goes to WFE() for the next 20B packet.

The input data is as follows: 1234567890ABCDEFGHIJ repeated 150 times (20Bx150=3KB), pasted in a terminal all at once as a long line no separators.

I've scoped it and I know bytes are properly being sent from the Segger bridge, RTS works correctly. Further more, if I count the uart driver RX interrupts, I get 3000 so all bytes actually arrive in the uart interrupt handler.

Digging for a few days I found a race condition in app_uart_fifo.c between uart_event_handler(NRF_DRV_UART_EVT_RX_DONE) and app_uart_get: The ISR handler can interrupt the get function which runs from main at a bad time, resulting in lost bytes and potentially double nrf_drv_uart_rx. This in turn can lead to UART HW FIFO overflow and/or software FIFO overflow.

The main issue in SDK11 is that, in app_uart_get we check for length and read a byte in a non-atomic way. It is a pretty obvious race once you see it.

I've checked SDK12 and while I think it improves the reliability I don't think it resolves the issue 100%.And I don't like it calls the uart app handler for every byte received. I think SDK11, which calls the app uart handler only when queue is empty is very elegant and efficient and preferable to SDK12 implementation.

I'm proposing the following patch which introduces two new functions:

  • in app_fifo.c: uint32_t app_fifo_read_is_full(app_fifo_t * p_fifo, uint8_t * p_byte, uint32_t* p_sz, bool* p_full) - this reads an array of bytes and atomically checks for full queue and moves the read pointer. The critical section is about 800ns.
  • in app_uart_fifo.c: uint32_t app_uart_get_array(uint8_t* p_byte, uint32_t* in_out_size) which uses the first function to read and test for full queue; if queue was full at the time of read, it schedules a new hardware rx.

Since I'm doing the memcpy's outside the critical section it is expected that app_fifo_read_is_full called on a fifo of size N may return <N bytes and is_full = true, if more characters arrive after or during the memcopy and before length was checked. This should not be an issue since application can easily compensate by reading again.

The above functions are only safe to be called from main. I've stressed the link using the attached project over a few hours transferring 10MB+ with no issues.

Here is a link with the project (sorry, couldn't figure how to properly embed code in here) www.dropbox.com/.../app_uart_TEST.zip

Thank you and please excuse the long post.

Related