This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Infinite loop in nrfx_uarte_uninit()

nRF52840, SD140, SDK 17.0.2

We're using UARTE to communicate with a peer MCU. We're using hardware flow control (CTS and RTS are enabled) and the baudrate is 115200. The traces are less than 2cm. We do not control any firmware on the peer MCU; it's vendor-supplied. We can tell it to power off, but at other times it is free to power itself off. When it does this, it de-asserts a connected GPIO pin which we have configured to interrupt the nRF52 on any logic change.

Our watchdog is firing in the infinite loop at the bottom of nrfx_uarte_uninit() (line 318 in our nrfx_uarte.c):

void nrfx_uarte_uninit(nrfx_uarte_t const * p_instance)
{
    uarte_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx];
    NRF_UARTE_Type * p_reg = p_instance->p_reg;

    if (p_cb->handler)
    {
        interrupts_disable(p_instance);
    }
    // Make sure all transfers are finished before UARTE is disabled
    // to achieve the lowest power consumption.
    nrf_uarte_shorts_disable(p_reg, NRF_UARTE_SHORT_ENDRX_STARTRX);

    // Check if there is any ongoing reception.
    if (p_cb->rx_buffer_length)
    {
        nrf_uarte_event_clear(p_reg, NRF_UARTE_EVENT_RXTO);
        nrf_uarte_task_trigger(p_reg, NRF_UARTE_TASK_STOPRX);
    }

    nrf_uarte_event_clear(p_reg, NRF_UARTE_EVENT_TXSTOPPED);
    nrf_uarte_task_trigger(p_reg, NRF_UARTE_TASK_STOPTX);

    // Wait for TXSTOPPED event and for RXTO event, provided that there was ongoing reception.
    while (!nrf_uarte_event_check(p_reg, NRF_UARTE_EVENT_TXSTOPPED) ||
           (p_cb->rx_buffer_length && !nrf_uarte_event_check(p_reg, NRF_UARTE_EVENT_RXTO)))
    {}

    nrf_uarte_disable(p_reg);
    pins_to_default(p_instance);

#if NRFX_CHECK(NRFX_PRS_ENABLED)
    nrfx_prs_release(p_reg);
#endif

    p_cb->state   = NRFX_DRV_STATE_UNINITIALIZED;
    p_cb->handler = NULL;
    NRFX_LOG_INFO("Instance uninitialized: %d.", p_instance->drv_inst_idx);
}

The scenario is this:

1. nRF52 is in a WFE state, woken up into our GPIO ISR by a high-to-low change driven by the peer MCU.

2. nRF52 interprets the hi-to-low pin change as the peer MCU powering off.

3. nRF52 deinits the UARTE block that connects to the peer MCU.

4. nRF52 spins forever in the loop pasted above, and is eventually "rescued" by the watchdog.

I have a few questions related to this that I'd love clarification on:

1. Will the UARTE ever generate the NRF_UARTE_EVENT_RXTO event if the peer is not asserting RTS? That would cause an infinite loop in nrfx_uarte_uninit, since the peer is long-gone and is not asserting TX/RX/CTS/RTS at this point.

2. I found nrfx_uarte_rx_abort() as well (line 577 in my nrfx_uarte.c file):

void nrfx_uarte_rx_abort(nrfx_uarte_t const * p_instance)
{
    uarte_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx];

    // Short between ENDRX event and STARTRX task must be disabled before
    // aborting transmission.
    if (p_cb->rx_secondary_buffer_length != 0)
    {
        nrf_uarte_shorts_disable(p_instance->p_reg, NRF_UARTE_SHORT_ENDRX_STARTRX);
    }
    nrf_uarte_task_trigger(p_instance->p_reg, NRF_UARTE_TASK_STOPRX);
    NRFX_LOG_INFO("RX transaction aborted.");
}

This code does not clear the internal rx_buffer_length field of the UARTE control block structure; is that a bug? Even if I abort any pending RX, the nrfx_uarte_uninit() code will still spin on the now-useless nonzero rx_buffer_length field.

3. What's the best way to shut down a previously-active UARTE block after the peer has powered off? We don't control its power state; it comes and goes.

4. Do you have any other hypotheses about why nrfx_uarte_uninit() would spin forever?

Thanks,

Bill

  • Update: I added a hacked-up second version of nrfx_uarte_uninit() that looks like this:

    void nrfx_uarte_uninit_2(nrfx_uarte_t const * p_instance)
    {
        uarte_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx];
        NRF_UARTE_Type * p_reg = p_instance->p_reg;
    
        if (p_cb->handler)
        {
            interrupts_disable(p_instance);
        }
        // Make sure all transfers are finished before UARTE is disabled
        // to achieve the lowest power consumption.
        nrf_uarte_shorts_disable(p_reg, NRF_UARTE_SHORT_ENDRX_STARTRX);
    
        // Check if there is any ongoing reception.
        if (p_cb->rx_buffer_length)
        {
            nrf_uarte_event_clear(p_reg, NRF_UARTE_EVENT_RXTO);
            nrf_uarte_task_trigger(p_reg, NRF_UARTE_TASK_STOPRX);
        }
    
        nrf_uarte_event_clear(p_reg, NRF_UARTE_EVENT_TXSTOPPED);
        nrf_uarte_task_trigger(p_reg, NRF_UARTE_TASK_STOPTX);
    
        nrf_delay_ms(10); // CHANGE: enough time for RXTO, don't loop
    
        nrf_uarte_disable(p_reg);
        pins_to_default(p_instance);
    
    #if NRFX_CHECK(NRFX_PRS_ENABLED)
        nrfx_prs_release(p_reg);
    #endif
    
        p_cb->state   = NRFX_DRV_STATE_UNINITIALIZED;
        p_cb->handler = NULL;
        NRFX_LOG_INFO("Instance uninitialized: %d.", p_instance->drv_inst_idx);
    }

    Note the loop has been replaced with a 10-msec busy wait. This of course fixes the watchdog loop, but I really need guidance from a Nordic support engineer to learn if it's safe or not, in addition to answers from my original post.

    I noted that in the SDK v15 version of this function, there is no loop. If anyone has an explanation for why that approach wasn't sufficient I'd also like to understand that.

    Thanks,

    Bill

  • Update 2: I see that the latest version of NRFX also adds a timeout to nrfx_uarte_uninit():

    https://github.com/NordicSemiconductor/nrfx/blob/master/drivers/src/nrfx_uarte.c#L332

    void nrfx_uarte_uninit(nrfx_uarte_t const * p_instance)
    {
        uarte_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx];
        NRF_UARTE_Type * p_reg = p_instance->p_reg;
    
        if (p_cb->handler)
        {
            interrupts_disable(p_instance);
        }
        // Make sure all transfers are finished before UARTE is disabled
        // to achieve the lowest power consumption.
        nrf_uarte_shorts_disable(p_reg, NRF_UARTE_SHORT_ENDRX_STARTRX);
    
        // Check if there is any ongoing reception.
        if (p_cb->rx_buffer_length)
        {
            nrf_uarte_event_clear(p_reg, NRF_UARTE_EVENT_RXTO);
            nrf_uarte_task_trigger(p_reg, NRF_UARTE_TASK_STOPRX);
        }
    
        nrf_uarte_event_clear(p_reg, NRF_UARTE_EVENT_TXSTOPPED);
        nrf_uarte_task_trigger(p_reg, NRF_UARTE_TASK_STOPTX);
    
        // Wait for TXSTOPPED event and for RXTO event, provided that there was ongoing reception.
        bool stopped;
    
        // The UARTE is able to receive up to four bytes after the STOPRX task has been triggered.
        // On lowest supported baud rate (1200 baud), with parity bit and two stop bits configured
        // (resulting in 12 bits per data byte sent), this may take up to 40 ms.
        NRFX_WAIT_FOR((nrf_uarte_event_check(p_reg, NRF_UARTE_EVENT_TXSTOPPED) &&
                      (!p_cb->rx_buffer_length || nrf_uarte_event_check(p_reg, NRF_UARTE_EVENT_RXTO))),
                      40000, 1, stopped);
        if (!stopped)
        {
            NRFX_LOG_ERROR("Failed to stop instance with base address: %p.", (void *)p_instance->p_reg);
        }
    
        nrf_uarte_disable(p_reg);
        pins_to_default(p_instance);
    
    #if NRFX_CHECK(NRFX_PRS_ENABLED)
        nrfx_prs_release(p_reg);
    #endif
    
        p_cb->state   = NRFX_DRV_STATE_UNINITIALIZED;
        p_cb->handler = NULL;
        NRFX_LOG_INFO("Instance uninitialized: %d.", p_instance->drv_inst_idx);
    }

    Please also provide the details of this discovery: what exact scenarios cause this loop to exit via the timeout?

  • 1. Will the UARTE ever generate the NRF_UARTE_EVENT_RXTO event if the peer is not asserting RTS? That would cause an infinite loop in nrfx_uarte_uninit, since the peer is long-gone and is not asserting TX/RX/CTS/RTS at this point

     I am curious to know if you still have p_cb->rx_buffer_length non zero (ongoing transaction) while the peer sets signals of power off? In my opinion after triggering STOPTX and STOPRX tasks, the events should arrive nevertheless, but I might have to dig a more into the hardare internals if there are any conditions for this events not to arrive. Atleast there seems to be no blockers for these events from the product specification.

  • I think these questions could all be answered by the author of the latest change to the NRFX library I linked above, where the infinite loop was changed to have a timeout. Since that code looks like a fix for my exact issue, and it was written by a Nordic engineer, my preference is to hear the answer rather then help explore the problem :) 

  • That said, in our specific case, p_cb->rx_buffer_length is always set to 1. We use the receive data event (ISR) to re-engage receiver with another 1-byte buffer, so there's no main-context scenario after we've enabled our receiver where rx_buffer_length is 0.

Related