Zephyr UART Async receive drops a single character

I'm using the async API for UART and I have noticed that, shortly after initialisation, I receive a single character with the UART_RX_READY event. This character is 0xff - and I suspect that's just the GPIO lines changing state. This character is easy to ignore, however it seems to break the internal state of the RX API. All future calls will have an offset of +1, but the total length (of all API requests in aggregate) will be correct. This causes us to ignore the first character in our buffer, and take one more than we should (not out-of-bounds, but stale/uninitialised memory).

I suspect we're getting an interrupt that we received data during the initialisation process, and that's throwing us off.

To fix this, I wait for the first UART_RX_READY event, call uart_rx_disable(), and then wait for the UART_RX_DISABLED message. Once that's done, I re-enable with uart_rx_enable() and the system seems to work reliably. I've got four of these devices in the field reading a couple thousand octets per day, and so far it seems to be working just fine!

I'm using SDK v2.2.0, but didn't see any relevant changes in the changelog.

Here's some partial code which has the complete TX/RX path:

static int setup(void) {
    int rc;

    LOG_DBG("Initialising serial tunnel...");

    rc = gpio_pin_configure_dt(&tunnel_power_pin_spec, GPIO_OUTPUT_INACTIVE);
    if (rc < 0)
    {
        LOG_ERR("Failed to configure power control pin.");
        return rc;
    }

    rc = gpio_pin_configure_dt(&tunnel_power_led_spec, GPIO_OUTPUT_INACTIVE);
    if (rc < 0)
    {
        LOG_ERR("Failed to configure power LED.");
        return rc;
    }

    if (!device_is_ready(tunnel_port))
    {
        LOG_ERR("Serial tunnel device is not ready!");
        return -ENODEV;
    }

    rc = uart_callback_set(tunnel_port, uart_callback, NULL);
    if (rc < 0)
    {
        LOG_ERR("Failed to set UART callback.");
        return rc;
    }

    /* Set up the initial receive buffer. */
    rx_buf1.is_in_use = true;
    rx_buf2.is_in_use = false;
    memset(rx_buf1.buffer, '?', sizeof(rx_buf1.buffer));
    memset(rx_buf2.buffer, '?', sizeof(rx_buf2.buffer));
    rc = uart_rx_enable(tunnel_port, rx_buf1.buffer, sizeof(rx_buf1.buffer), CONFIG_OPITO_SERIAL_TUNNEL_RECEIVE_TIMEOUT_US);
    if (rc < 0) {
        LOG_ERR("Failed to enable UART RX.");
        return rc;
    }

    return 0;
}


static void uart_callback(const struct device *dev, struct uart_event *evt, void *user_data) {
    /* There's a strange bug with the DMA backing this async UART driver.
     * We get a single octet on boot (likely caused by a wiggle on the UART
     * line), and that introduces an off-by-one error that never goes away.
     * The fix seems to be resetting the UART after we get that character.
     */
    static bool destroyed_first_message = false;

    switch (evt->type) {
    case UART_RX_RDY:
    {
        if (!destroyed_first_message) {
            uart_rx_disable(dev);
            return;
        }

        struct opito_serial_tunnel_module_event *msg = new_opito_serial_tunnel_module_event();

        const uint8_t *ptr = &evt->data.rx.buf[evt->data.rx.offset];
        memcpy(msg->data.data_raw.buffer, ptr, evt->data.rx.len);
        msg->data.data_raw.length = evt->data.rx.len;
        msg->data.data_raw.timestamp = k_uptime_get();
        msg->type = OPITO_EVT_SERIAL_TUNNEL_DEVICE_DATA_RX;
        APP_EVENT_SUBMIT(msg);
        break;
    }

    case UART_RX_BUF_REQUEST:
    {
        if (!rx_buf1.is_in_use) {
            rx_buf1.is_in_use = true;
            uart_rx_buf_rsp(dev, rx_buf1.buffer, sizeof(rx_buf1.buffer) - 1);
        } else if (!rx_buf2.is_in_use) {
            rx_buf2.is_in_use = true;
            uart_rx_buf_rsp(dev, rx_buf2.buffer, sizeof(rx_buf2.buffer) - 1);
        }

        /* If we didn't manage to hand over a buffer, we'll get the disabled event. */
        break;
    }

    case UART_RX_BUF_RELEASED:
        struct rx_buf *buf = CONTAINER_OF(evt->data.rx_buf.buf, struct rx_buf, buffer);
        buf->is_in_use = false;
        break;

    case UART_RX_DISABLED:
        rx_buf1.is_in_use = true;
        rx_buf2.is_in_use = false;
        uart_rx_enable(tunnel_port, rx_buf1.buffer, sizeof(rx_buf1.buffer), CONFIG_OPITO_SERIAL_TUNNEL_RECEIVE_TIMEOUT_US);

        if (!destroyed_first_message) {
            destroyed_first_message = true;
        } else {
            /* This is buffer overrun - as we weren't able to pass another buffer to the UART! */
            SEND_ERROR(opito_serial_tunnel, OPITO_EVT_SERIAL_TUNNEL_ERROR, -ENOBUFS);
        }
        break;

    default:
        break;
    }
}
Let me know if I've missed anything! For now, our workaround is fine - but it seems like a bug that will bite again in future :)

Parents
  • Hi  and  ,

    I was recently given some pointers on this topic.

    The reason for the miscounted buffer size is because by default RX bytes are counted by handling interrupts on every RX byte. If the interrupt is not handled on time, the counter will miss a byte.

    There is an option to enable HW byte counting using TIMER and PPI and then every byte is counted correctly. This is enabled by setting these Kconfig:

    CONFIG_UART_1_NRF_HW_ASYNC=y //enabling hw counting
    CONFIG_UART_1_NRF_HW_ASYNC_TIMER=2 //picking timer instance used for that

    Please let me know if this remedies the issue for you.

    Hieu

Reply
  • Hi  and  ,

    I was recently given some pointers on this topic.

    The reason for the miscounted buffer size is because by default RX bytes are counted by handling interrupts on every RX byte. If the interrupt is not handled on time, the counter will miss a byte.

    There is an option to enable HW byte counting using TIMER and PPI and then every byte is counted correctly. This is enabled by setting these Kconfig:

    CONFIG_UART_1_NRF_HW_ASYNC=y //enabling hw counting
    CONFIG_UART_1_NRF_HW_ASYNC_TIMER=2 //picking timer instance used for that

    Please let me know if this remedies the issue for you.

    Hieu

Children
Related