This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nrfx_uart_tx safe to call from ISR?

SDK16.0.0, S132v7.0.1, nRF52840

We call nrfx_uart_tx() from the main application (non-isr) context to initiate multi-byte data transfer from a queue we manage.

We also call nrfx_uart_tx() from the NRFX_UART_EVT_TX_DONE event handler; this seems safe because we're in an ISR that can't be pre-empted at this point, and just set p_cb->tx_buffer_length to 0, so by the time the interrupt unwinds back to user code, the user code never sees the brief window where p_cb->tx_buffer_length was 0.

Therefore, it seems "safe" to call nrfx_uart_tx() from both the user context and the NRFX_UART_EVT_TX_DONE event that arises from the UART0_IRQn IRQ.

We log via the UART, though, and it's handy to log small amounts of data from other ISRs like the app_timer callback. Unfortunately, this seems like it has the opportunity for a race condition. Imagine the following scenario:

1. There is no pending TX on the UART: p_cb->tx_buffer_length == 0

2. The application, from the main context (outside of an ISR), calls nrfx_uart_tx with some data.

3. nrfx_uart_tx calls nrfx_uart_tx_in_progress (nrfx_uart.c, line 334), which loads tx_buffer_length into a register and compares it to zero.

4. Sometime _after_ tx_buffer_length gets loaded into a register, but _before_ nrfx_uart_tx() has a chance to write the new buffer length to p_cb->tx_buffer_length, an IRQ fires.

5. The handler for this IRQ, say, app_timer, calls some user code that runs from the IRQ context.

6. This user code decides to transmit some data over the UART, and calls nrfx_uart_tx(). p_cb->tx_buffer_length is zero, so it loads up p_cb and starts transmitting.

7. The user code exits, the app_timer ISR unwinds, and control returns back from the IRQ context to the main context.

8. The application picks right back up where it left off in step 3, sees that its pre-IRQ register copy of p_cb->tx_buffer_length is 0, and continues through the nrfx_uart_tx gate and overwrites p_cb with new transmission data.

At this point the nrfx_uart_t structure is corrupt and out-of-sync with the UART hardware. One approach that might mitigate this corruption is disabling UART0_IRQn, is that a reasonable approach? Should we just wrap our calls to nrfx_uart_tx() with sd_nvic_DisableIRQ(UART0_IRQn) and sd_nvic_EnableIRQ(UART0_IRQn)? Is that safe to do inside of an ISR?

My preference is to avoid having a "uart poll" function that I have to call all the time in my main application context, and have that be the only place I call nrfx_uart_tx().

It's nicer if I only call nrfx_uart_tx when I have new data to enqueue and transfer, that minimizes nRF-side latency and keeps the UART as busy as possible.

  • I think I would have used sd_nvic_critical_region_enter() and sd_nvic_critical_region_exit() around nrfx_uart_tx() in this case. I am not sure if temporary disabling UART0_IRQn alone would help here, since the manipulation is done in main and app_timer?

  • Ah, yes, sorry; disabling the uart IRQ only prevents the "safe" nrfx_uart event from happening; that's not the one I care about.

    So I do need to do a full system interrupt disable/enable around nrfx_uart_tx, then? Is this something you guys could consider either documenting or improving somewhere?

  • The sd_nvic_critical_region_enter() and sd_nvic_critical_region_exit() will only temporary prevent application interrupts, the softdevice will still function as normal. I can check with the developers what they think.

    Kenneth

  • Wait a sec: what happens if my main application context is inside of nrfx_uart_tx(), and then right at the race condition instant I described in my original post, softdevice engages and takes control? What if it calls an event handler back in my code that ends up calling nrfx_uart_tx()?

    Isn't that the same race?

  • The critical region will only allow the following softdevice interrupts (copied from nrf_nvic.h):

    /**@brief Interrupts used by the SoftDevice, with IRQn in the range 0-31. */
    #define __NRF_NVIC_SD_IRQS_0 ((uint32_t)( \
          (1U << POWER_CLOCK_IRQn) \
        | (1U << RADIO_IRQn) \
        | (1U << RTC0_IRQn) \
        | (1U << TIMER0_IRQn) \
        | (1U << RNG_IRQn) \
        | (1U << ECB_IRQn) \
        | (1U << CCM_AAR_IRQn) \
        | (1U << TEMP_IRQn) \
        | (1U << __NRF_NVIC_NVMC_IRQn) \
        | (1U << (uint32_t)SWI5_IRQn) \
      ))

    The softdevice callback uses SD_EVT_IRQn (SWI2_IRQn), which is disabled during critical region. So the softdevice callback event will not be a problem (it will be pending if it occurs).

    After discussing with the developers they think you can add some external mechanism like bool uart_acquire() returning true if successfully acquired and uart_release() when done, something like this:

    bool uart_acquire(void) {
    bool ret = false;
    critical_region_enter();
    if (flag == 0) {
    flag = 1; 
    ret = true;
    }
    critical_region_exit();
    return ret;
    }
    
    uart_release(void) {flag = 0); can happen in TX_DONE

Related