This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Infinite loop in nrfx_uarte_uninit()

nRF52840, SD140, SDK 17.0.2

We're using UARTE to communicate with a peer MCU. We're using hardware flow control (CTS and RTS are enabled) and the baudrate is 115200. The traces are less than 2cm. We do not control any firmware on the peer MCU; it's vendor-supplied. We can tell it to power off, but at other times it is free to power itself off. When it does this, it de-asserts a connected GPIO pin which we have configured to interrupt the nRF52 on any logic change.

Our watchdog is firing in the infinite loop at the bottom of nrfx_uarte_uninit() (line 318 in our nrfx_uarte.c):

void nrfx_uarte_uninit(nrfx_uarte_t const * p_instance)
{
    uarte_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx];
    NRF_UARTE_Type * p_reg = p_instance->p_reg;

    if (p_cb->handler)
    {
        interrupts_disable(p_instance);
    }
    // Make sure all transfers are finished before UARTE is disabled
    // to achieve the lowest power consumption.
    nrf_uarte_shorts_disable(p_reg, NRF_UARTE_SHORT_ENDRX_STARTRX);

    // Check if there is any ongoing reception.
    if (p_cb->rx_buffer_length)
    {
        nrf_uarte_event_clear(p_reg, NRF_UARTE_EVENT_RXTO);
        nrf_uarte_task_trigger(p_reg, NRF_UARTE_TASK_STOPRX);
    }

    nrf_uarte_event_clear(p_reg, NRF_UARTE_EVENT_TXSTOPPED);
    nrf_uarte_task_trigger(p_reg, NRF_UARTE_TASK_STOPTX);

    // Wait for TXSTOPPED event and for RXTO event, provided that there was ongoing reception.
    while (!nrf_uarte_event_check(p_reg, NRF_UARTE_EVENT_TXSTOPPED) ||
           (p_cb->rx_buffer_length && !nrf_uarte_event_check(p_reg, NRF_UARTE_EVENT_RXTO)))
    {}

    nrf_uarte_disable(p_reg);
    pins_to_default(p_instance);

#if NRFX_CHECK(NRFX_PRS_ENABLED)
    nrfx_prs_release(p_reg);
#endif

    p_cb->state   = NRFX_DRV_STATE_UNINITIALIZED;
    p_cb->handler = NULL;
    NRFX_LOG_INFO("Instance uninitialized: %d.", p_instance->drv_inst_idx);
}

The scenario is this:

1. nRF52 is in a WFE state, woken up into our GPIO ISR by a high-to-low change driven by the peer MCU.

2. nRF52 interprets the hi-to-low pin change as the peer MCU powering off.

3. nRF52 deinits the UARTE block that connects to the peer MCU.

4. nRF52 spins forever in the loop pasted above, and is eventually "rescued" by the watchdog.

I have a few questions related to this that I'd love clarification on:

1. Will the UARTE ever generate the NRF_UARTE_EVENT_RXTO event if the peer is not asserting RTS? That would cause an infinite loop in nrfx_uarte_uninit, since the peer is long-gone and is not asserting TX/RX/CTS/RTS at this point.

2. I found nrfx_uarte_rx_abort() as well (line 577 in my nrfx_uarte.c file):

void nrfx_uarte_rx_abort(nrfx_uarte_t const * p_instance)
{
    uarte_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx];

    // Short between ENDRX event and STARTRX task must be disabled before
    // aborting transmission.
    if (p_cb->rx_secondary_buffer_length != 0)
    {
        nrf_uarte_shorts_disable(p_instance->p_reg, NRF_UARTE_SHORT_ENDRX_STARTRX);
    }
    nrf_uarte_task_trigger(p_instance->p_reg, NRF_UARTE_TASK_STOPRX);
    NRFX_LOG_INFO("RX transaction aborted.");
}

This code does not clear the internal rx_buffer_length field of the UARTE control block structure; is that a bug? Even if I abort any pending RX, the nrfx_uarte_uninit() code will still spin on the now-useless nonzero rx_buffer_length field.

3. What's the best way to shut down a previously-active UARTE block after the peer has powered off? We don't control its power state; it comes and goes.

4. Do you have any other hypotheses about why nrfx_uarte_uninit() would spin forever?

Thanks,

Bill

Parents
  • 1. Will the UARTE ever generate the NRF_UARTE_EVENT_RXTO event if the peer is not asserting RTS? That would cause an infinite loop in nrfx_uarte_uninit, since the peer is long-gone and is not asserting TX/RX/CTS/RTS at this point

     I am curious to know if you still have p_cb->rx_buffer_length non zero (ongoing transaction) while the peer sets signals of power off? In my opinion after triggering STOPTX and STOPRX tasks, the events should arrive nevertheless, but I might have to dig a more into the hardare internals if there are any conditions for this events not to arrive. Atleast there seems to be no blockers for these events from the product specification.

  • I think these questions could all be answered by the author of the latest change to the NRFX library I linked above, where the infinite loop was changed to have a timeout. Since that code looks like a fix for my exact issue, and it was written by a Nordic engineer, my preference is to hear the answer rather then help explore the problem :) 

  • I have requested the developer to shed some light on this. But my feeling is that we do not need this check or wait state. But lets wait for the answer from the expert. I will comeback to you as soon as I hear something from him.

  • I think the nRF52840 Product Specification v1.2 (https://infocenter.nordicsemi.com/pdf/nRF52840_PS_v1.2.pdf) informs us here:

    6.34.7 Low Power (page 514)

    "The STOPTX and STOPRX tasks may not be always needed (the peripheral might already be stopped), but if STOPTX and/or STOPRX is sent, software shall wait until the TXSTOPPED and/or RXTO event is received in response, before disabling the peripheral through the ENABLE register."

    The specification uses the word "shall". The document doesn't define what "shall" means, which is unfortunate, but it almost always means "if you don't do this, really bad things will happen and it's unclear if the part will work correctly afterwards".

    So, you're suggesting that maybe we don't need to wait for RXTO/TXSTOPPED before disabling via ENABLE, but the part specification clearly says that we shall do that. Then at the same time, we see an infinite loop waiting for this condition, and discover that a not-yet-released newer version of the loop has a timeout.

    I'm happy using the timeout because Nordic built it. I don't even need to understand why, though I'd like to.

    If I had to boil this issue down to one question, it would be: "After timing out waiting for RXTO / TXSTOPPED in nrfx_uarte_uninit(), is it safe to later re-initialize and resume use of that same UARTE block?"

    Do we need to wait for some amount of time before initializing it again, etc?

  • Yes, you are spot on to the text in the Low Power section which the developer also referred in his response. So the reason according to him to have wait states (forever loop was mistake, timeouts suits best) was that it is written in the PS to do so. 

    There could also be a race condition in your code, If you have this NRFX_WAIT_FOR interrupted with some interrupt context where you call nrfx_uarte_tx, then this function will clear the TXSTOPPED event before it triggers TASKS_STARTTX. And after this when the context comes back to the NRFX_WAIT_FOR then it will never have this event coming as it is already cleared by nrfx_uarte_tx.  So I think we need to have a mutual exclusion between uninit and nrfx_uarte_tx so that there are no race conditions involved.

  • Ah, that's very insightful, thanks for the suggestion! I'll audit the uses of nrfx_uarte_tx() in our code and make sure we never send it in an interrupt (other than a UARTE interrupt, since that's disabled before the uninit() loop)

Reply Children
Related