UART TX timing differs between ISR callback and workqueue

Hello Nordic team,

Environment:

  • SDK 3.1.0
  • nRF54L15
  • UART / RS-232

I am seeing different UART TX timing depending on whether I transmit from inside the UART RX callback vs using a k_work handler.

On our nRF‑based device at 9600 baud, each byte correctly takes ~1.04 ms.
However, the gap between bytes (idle time between stop bit of one byte and start bit of the next) changes drastically depending on context:

1. UART RX callback → UART TX

If we call our transmit function directly inside the UART RX callback (or a parser function invoked from it), the inter‑byte gap is only ~53 µs.

2. k_work_delayable handler → UART TX

If we schedule the exact same transmit function using k_work_schedule(), the inter‑byte gap increases to ~313 µs, and the target device (RS‑232 peer) accepts the frame reliably.

Nothing else changes; same bytes, same baud rate, same framing.

This difference in inter‑byte spacing is causing a downstream RS‑232 device (non‑Nordic) to fail to parse frames correctly unless TX happens from a workqueue instead of inside the RX callback.

I will attach logic analyzer screenshots showing both cases.

Questions:

  1. Is it expected that UART TX behaves differently when called inside the RX callback vs from a workqueue?
  2. Is there a known restriction that UART TX should not be triggered directly inside the RX callback?

Any guidance or clarification would be appreciated. I will provide LA captures and source snippets in the attachments. Thank you.

// This psuedo code is placed in the same location within an UART RX callback.
// My device recieves a UART message then transmits a UART acknoweldgement. 

// Non-working code
void ProcessUartRx()
{
    SendAck(message->header.message_id);
}

// Working code
void delayed_ack_handler(struct k_work *work);
static uint8_t g_pending_ack_msg_id;
K_WORK_DELAYABLE_DEFINE(g_delayed_ack_work, delayed_ack_handler);

void delayed_ack_handler(struct k_work *work)
{
    SendAck(g_pending_ack_msg_id);
}

void ProcessUartRx()
{
    _pending_ack_msg_id = message->header.message_id;
    k_work_schedule(&g_delayed_ack_work, K_MSEC(0));
}


Saleae Logic Analyzer captures

failed-uart.salworking-uart.sal

Screenshots

Parents
  • Hello,

    Not sure what api you are working on, but normally you just send the entire string or buffer in one chunk instead of sending it on a byte level. Is there any reason why you need to send individual bytes? That doesn't really explain the difference, but just to mention the intention.

    Kenneth

  • Hi Kenneth,

    Thanks for the clarification. There wasn’t any specific reason we were using uart_poll_out(). The original implementation just evolved that way. Based on your suggestion, we’ve now switched over to using uart_tx() to send the whole buffer at once, and now I see the correct timing on the logic analyzer capture.

    I’m still interested in understanding why the timing difference occurs between calling TX inside the RX callback versus calling it from a workqueue/different thread, so any insight from the Nordic side on that behavior would be appreciated.

    Thanks again,
    Kamal

  • Hi Kamal,

    When uart_poll_out() is called from an interrupt context (your RX callback in this case), the driver will enter a spin loop if a transfer is already in progress and busy wait until the UART transmitter is ready again. However, when the same function is called from thread context, it will instead enter the wait_tx_ready() function which will call k_msleep(1) if it has to wait for more than 100 us for the transmitter to become ready, which I expect to happen at this baud rate. I think this call to k_msleep(1) is the likely explanation for why you see the extra delay between each transfer.

    Best regards,

    Vidar

  • Hi Vidar,

    Thanks for your input. I understand what you are saying but if k_msleep(1) was called in the working case then the time gap between bytes would be at least 1 ms where as the logic analyzer shows a 313 µs time gap. 

    Also in the failing case, if the UART functionality is entering a spin loop and busy wait until the TX is ready, it must wait at least ~104 µs to follow the 9600 baud rate. My theory is that the receiving device is not correctly processing my failing case UART because the time gap between bytes is not at least ~104 µs. Meaning the receiving device may not see that  ~53 µs gap resulting in failed UART decoding. 

    Best,
    Kamal 

  • Hi Kamal,

    If my math is not completely off today it should take about 1.04 ms to transmit one byte at 9600 baud: (1 start bit + 8 data bits + 1 stop bit) / 9600. And I expect the UART transmitter to continue transmitting while the thread is sleeping in k_msleep() since the busy wait is only 100us,  so this time should not be complety wasted either. I think a simple test you could do is to increase the busy wait from 100 us to 2000 us and see how this affects the gap between the bytes. 

    Best regards,

    Vidar

Reply
  • Hi Kamal,

    If my math is not completely off today it should take about 1.04 ms to transmit one byte at 9600 baud: (1 start bit + 8 data bits + 1 stop bit) / 9600. And I expect the UART transmitter to continue transmitting while the thread is sleeping in k_msleep() since the busy wait is only 100us,  so this time should not be complety wasted either. I think a simple test you could do is to increase the busy wait from 100 us to 2000 us and see how this affects the gap between the bytes. 

    Best regards,

    Vidar

Children
No Data
Related