Zephyr UART Async receive drops a single character

I'm using the async API for UART and I have noticed that, shortly after initialisation, I receive a single character with the UART_RX_READY event. This character is 0xff - and I suspect that's just the GPIO lines changing state. This character is easy to ignore, however it seems to break the internal state of the RX API. All future calls will have an offset of +1, but the total length (of all API requests in aggregate) will be correct. This causes us to ignore the first character in our buffer, and take one more than we should (not out-of-bounds, but stale/uninitialised memory).

I suspect we're getting an interrupt that we received data during the initialisation process, and that's throwing us off.

To fix this, I wait for the first UART_RX_READY event, call uart_rx_disable(), and then wait for the UART_RX_DISABLED message. Once that's done, I re-enable with uart_rx_enable() and the system seems to work reliably. I've got four of these devices in the field reading a couple thousand octets per day, and so far it seems to be working just fine!

I'm using SDK v2.2.0, but didn't see any relevant changes in the changelog.

Here's some partial code which has the complete TX/RX path:

static int setup(void) {
    int rc;

    LOG_DBG("Initialising serial tunnel...");

    rc = gpio_pin_configure_dt(&tunnel_power_pin_spec, GPIO_OUTPUT_INACTIVE);
    if (rc < 0)
    {
        LOG_ERR("Failed to configure power control pin.");
        return rc;
    }

    rc = gpio_pin_configure_dt(&tunnel_power_led_spec, GPIO_OUTPUT_INACTIVE);
    if (rc < 0)
    {
        LOG_ERR("Failed to configure power LED.");
        return rc;
    }

    if (!device_is_ready(tunnel_port))
    {
        LOG_ERR("Serial tunnel device is not ready!");
        return -ENODEV;
    }

    rc = uart_callback_set(tunnel_port, uart_callback, NULL);
    if (rc < 0)
    {
        LOG_ERR("Failed to set UART callback.");
        return rc;
    }

    /* Set up the initial receive buffer. */
    rx_buf1.is_in_use = true;
    rx_buf2.is_in_use = false;
    memset(rx_buf1.buffer, '?', sizeof(rx_buf1.buffer));
    memset(rx_buf2.buffer, '?', sizeof(rx_buf2.buffer));
    rc = uart_rx_enable(tunnel_port, rx_buf1.buffer, sizeof(rx_buf1.buffer), CONFIG_OPITO_SERIAL_TUNNEL_RECEIVE_TIMEOUT_US);
    if (rc < 0) {
        LOG_ERR("Failed to enable UART RX.");
        return rc;
    }

    return 0;
}


static void uart_callback(const struct device *dev, struct uart_event *evt, void *user_data) {
    /* There's a strange bug with the DMA backing this async UART driver.
     * We get a single octet on boot (likely caused by a wiggle on the UART
     * line), and that introduces an off-by-one error that never goes away.
     * The fix seems to be resetting the UART after we get that character.
     */
    static bool destroyed_first_message = false;

    switch (evt->type) {
    case UART_RX_RDY:
    {
        if (!destroyed_first_message) {
            uart_rx_disable(dev);
            return;
        }

        struct opito_serial_tunnel_module_event *msg = new_opito_serial_tunnel_module_event();

        const uint8_t *ptr = &evt->data.rx.buf[evt->data.rx.offset];
        memcpy(msg->data.data_raw.buffer, ptr, evt->data.rx.len);
        msg->data.data_raw.length = evt->data.rx.len;
        msg->data.data_raw.timestamp = k_uptime_get();
        msg->type = OPITO_EVT_SERIAL_TUNNEL_DEVICE_DATA_RX;
        APP_EVENT_SUBMIT(msg);
        break;
    }

    case UART_RX_BUF_REQUEST:
    {
        if (!rx_buf1.is_in_use) {
            rx_buf1.is_in_use = true;
            uart_rx_buf_rsp(dev, rx_buf1.buffer, sizeof(rx_buf1.buffer) - 1);
        } else if (!rx_buf2.is_in_use) {
            rx_buf2.is_in_use = true;
            uart_rx_buf_rsp(dev, rx_buf2.buffer, sizeof(rx_buf2.buffer) - 1);
        }

        /* If we didn't manage to hand over a buffer, we'll get the disabled event. */
        break;
    }

    case UART_RX_BUF_RELEASED:
        struct rx_buf *buf = CONTAINER_OF(evt->data.rx_buf.buf, struct rx_buf, buffer);
        buf->is_in_use = false;
        break;

    case UART_RX_DISABLED:
        rx_buf1.is_in_use = true;
        rx_buf2.is_in_use = false;
        uart_rx_enable(tunnel_port, rx_buf1.buffer, sizeof(rx_buf1.buffer), CONFIG_OPITO_SERIAL_TUNNEL_RECEIVE_TIMEOUT_US);

        if (!destroyed_first_message) {
            destroyed_first_message = true;
        } else {
            /* This is buffer overrun - as we weren't able to pass another buffer to the UART! */
            SEND_ERROR(opito_serial_tunnel, OPITO_EVT_SERIAL_TUNNEL_ERROR, -ENOBUFS);
        }
        break;

    default:
        break;
    }
}
Let me know if I've missed anything! For now, our workaround is fine - but it seems like a bug that will bite again in future :)

Parents Reply Children
No Data
Related