This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Libuarte_async bug(s) - missing data / wrong buffer returned.

Hi,

We've been having problems trying to implement a high-speed uart communication between the nRF52832 and an STM32.

We switched to using libuarte so we can use DMA even though our communication protocol has variable-sized packets Unfortunately, after several seconds of working perfectly, we suddenly start seeing errors. In between good new data blocks, we suddenly have one block that has old data. So either the data was never written by DMA, or it was written somewhere other than we are told to read from. It was only for the amount of data that the NRF_LIBUARTE_ASYNC_EVT_RX_DATA reported. It seemed kinda like the timeout triggered before the data was actually copied by the DMA or something along those lines.

We've reduced the transmission speed to 115200, and the error still occurs. We've used a logic analyzer to check if the data is really being transferred - it definitely is, but the nordic is not getting it. We've tried it with SDK 15.0, 15.3 and 16.0, the problem remains.

Since our codebase is way too complex to try and post something helpful here, we tried recreating the issue with two nRF52840 DKs.

Unfortunately, we're not getting the same errors, but are getting another error in the first RX event. First, right after we initialize libuarte_async we get NRF_LIBUARTE_ASYNC_EVT_ERROR errorSrc 0 - this started with SDK 16.0, it wasn't occurring when using SDK 15.x with ported libuarte. Then we get a bunch of skipped bytes reported (check the attached project):

<info> app: Rx 128@x20004B84
<error> app: RX: expected x65, got x39 instead
<error> app: RX: expected x3A, got x3D instead
<error> app: RX: expected x3E, got x41 instead
<error> app: RX: expected x42, got x45 instead
<error> app: RX: expected x46, got x49 instead
<error> app: RX: expected x4A, got x4D instead

Note: We're using arm-none-eabi-gcc  v4.9.3 to build it the test project, targeting nrf52840 chip sitting on development kits. They are connected and both running the same firmware, both doing RX and TX using libuarte.

ble_app_libuarte_test.tar.gz

Parents
  • I am not aware of any issue, but using the latest is in general the recommended.

    My only input is that I would recommend to increase .timeout_us = 100 to 300 (or possible more for testing). Reasoning is that depending on other interrupts that may be occurring (causing the uart to be pending), you want to make sure that you can always handle the uart interrupt in time before the next may occur.

    "errorSrc 0" was very odd, maybe you can set a breakpoint on the error and double check value. Making sure RXD is always pulled up can be something to check.

    Best regards,
    Kenneth

  • Hi, We're using the latest SDK (16.0.0), but we first ported to 15.0 because that's what the project is using. I have also increased timeout_us to 500 for testing, but it didn't change a thing. We're still seeing a bunch of wrong incoming data, usually in the first RX event. I have confirmed that the other side is sending a continuous stream of bytes from 0x00 - 0x9A, then it resets and starts with 0x00 again. This doesn't make sense, since it reports values above 0x9A, which are not being sent over the line at all: 

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    <warning> app: Boot
    <error> app: NRF_LIBUARTE_ASYNC_EVT_ERROR errorSrc 4
    <error> app: NRF_LIBUARTE_ASYNC_EVT_ERROR errorSrc 4
    <warning> app: Test TX started.
    <info> app: Debug logging for UART over RTT started.
    <error> app: NRF_LIBUARTE_ASYNC_EVT_ERROR errorSrc 4
    <info> app: Rx 128@x20004B84
    <error> app: RX: expected x65, got xAB instead
    <error> app: RX: expected x11, got xD5 instead
    <error> app: RX: expected x3B, got xE5 instead
    <error> app: RX: expected x4B, got xF5 instead
    <error> app: RX: expected x5B, got x85 instead
    <error> app: RX: expected x86, got x89 instead
    <error> app: RX: expected x8A, got x8D instead
    <error> app: RX: expected x8E, got x91 instead
    <error> app: RX: expected x92, got x95 instead
    <error> app: RX: expected x96, got x99 instead
    <error> app: RX: expected x9A, got x9D instead
    <error> app: RX: expected x03, got xA1 instead
    <error> app: RX: expected x07, got xA5 instead
    <error> app: RX: expected x0B, got xA9 instead
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    As for the errorSrc 0 thing, I found the error, not sure how I've missed it before: in libuarte_async.c in the handler for NRF_LIBUARTE_DRV_EVT_ERROR, you didn't actually copy the errorSrc. Basically, the .data portion is missing: 

    Fullscreen
    1
    2
    3
    4
    5
    6
    nrf_libuarte_async_evt_t evt = {
    .type = NRF_LIBUARTE_ASYNC_EVT_ERROR,
    .data = {
    .errorsrc = p_evt->data.errorsrc
    }
    };
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Now that I've added that I get the errorsrc4 == framing error, which makes sense, since the other side is already sending while this side is initializing, meaning RX line is changing, and so we get framing errors (a few, right after initialization).

  • Can you check if starting the HFCLK may help? E.g. nrf_drv_clock_hfclk_request()/sd_clock_hfclk_request().

    Do you have a logic analyzer trace of the serial incomming data? I just want to check the size of the packets, the delay between packets/bytes and baud rate.

    Gasper said:
    As for the errorSrc 0 thing, I found the error, not sure how I've missed it before: in libuarte_async.c in the handler for NRF_LIBUARTE_DRV_EVT_ERROR, you didn't actually copy the errorSrc.

    Thanks, I will report it.

    Best regards,
    Kenneth 

Reply
  • Can you check if starting the HFCLK may help? E.g. nrf_drv_clock_hfclk_request()/sd_clock_hfclk_request().

    Do you have a logic analyzer trace of the serial incomming data? I just want to check the size of the packets, the delay between packets/bytes and baud rate.

    Gasper said:
    As for the errorSrc 0 thing, I found the error, not sure how I've missed it before: in libuarte_async.c in the handler for NRF_LIBUARTE_DRV_EVT_ERROR, you didn't actually copy the errorSrc.

    Thanks, I will report it.

    Best regards,
    Kenneth 

Children
  • Enabling hfclk doesn't change anything. 

    You can find a logic analyzer trace in the csv file: 8473.rx.tar.gz. I've truncated the trace at the beginning to where Rx started - the nrf_libuarte_async_enable call returned in the middle of the first 0x62 byte, which is probably the framing error reported at the beginning. The first full buffer is reported (NRF_LIBUARTE_ASYNC_EVT_RX_DATA) after ~2.826ms at timestamp 2.10230902. The log for this trace is:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    <warning> app: Boot
    <warning> app: Test TX started.
    <info> app: Debug logging for UART over RTT started.
    <error> app: NRF_LIBUARTE_ASYNC_EVT_ERROR errorSrc 4
    <info> app: Rx 128@x20004B84
    <error> app: RX: expected x65, got x89 instead
    <error> app: RX: expected x8A, got x8D instead
    <error> app: RX: expected x8E, got x91 instead
    <error> app: RX: expected x92, got x95 instead
    <error> app: RX: expected x96, got x99 instead
    <error> app: RX: expected x9A, got x9D instead
    <error> app: RX: expected x03, got xA1 instead
    <error> app: RX: expected x07, got xA5 instead
    <error> app: RX: expected x0B, got xA9 instead
    <error> app: RX: expected x0F, got xAD instead
    <error> app: RX: expected x13, got xB1 instead
    <error> app: RX: expected x17, got xB5 instead
    <error> app: RX: expected x1B, got xB9 instead
    <error> app: RX: expected x1F, got xBD instead
    <error> app: RX: expected x23, got xC1 instead
    <error> app: RX: expected x27, got xC5 instead
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Tx started roughly 20us after Rx. Spacing between packets is roughly 10us. Baudrate is set in fw to 460800. Tx packet size is 155 bytes (increasing from 0x00 to 0x9A), Rx buffer size is 128 and the size mismatch is intentional as in the original project we have variable packet lengths. Since this is a data stream (for our use case as well), we don't particularly care that packet end isn't detected in this case, but we do need it (and it works fine) for a different mode of operation in our core project.

  • Gasper said:
    You can find a logic analyzer trace in the csv file: 8473.rx.tar.gz.

    I do not see any errors in this trace?

    This doesn't really display what I am looking for. I thought maybe you could make a logic analyzer trace (e.g using https://www.saleae.com/) where you did a trace of the UART RX pin and toggled a IO on UART errors, this would make it easier to go to the where the issue occurred and look at the actual timing on a bit level for previous bytes (if you were able to also output on UART TX the expected value and received value, then we could really see on a bit level for the previous UART RX what may have happened). The .csv is not detailed enough for this. 

    How frequently does an error occur?

  • Sure, I can make the trace, although that will not help a thing. The low-level UART errors (framing) make sense, they only appear when the libuarte is being initialized/enabled, so a few of those at the beginning isn't the problem. The problem is the data itself, for which the errors are reported when the buffer fills out, at which point the data is checked and copied to a queue for ble transmission, and then the libuarte buffer is freed.

    Anyway, here is the trace, together with the test project that generated the trace: trace.tar.gz

    The project is tested on two nrf52840_DK boards, so you could test it as well, might be easier than this back and forth.

    On the other hand, increasing the idle timeout on our main project helped the issue, although I still feel it shouldn't be an issue in the first place, at least not like that. I would understand missing bytes if callback handling took to long (in the real project we don't parse the data in the callback handler, we just copy it to our main buffer queue).

  • Hi,

    I've tried your example and after tweaking it a bit i see no errors. I've made following changes:

    - commented out  printf("\r\nUART started.\r\n"); i didn't check if it is the case but if retarget goes to uart then it will collide with libuarte driving same uart.

    - moved initialization of libuarte after advertising init and adding 1second delay before start. This is to ensure that both boards are up and running before TX is started. If RX is started when TX is ongoing then there is no possibility to synchronize. RX is started in the middle of a byte and received data is garbled

    - changed initilization of last_received_byte to (TX_SIZE- 1) to ensure that first byte is not received as error.

    Source file attached.

    Krzysztof

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    /**
    * Copyright (c) 2014 - 2019, Nordic Semiconductor ASA
    *
    * All rights reserved.
    *
    * Redistribution and use in source and binary forms, with or without modification,
    * are permitted provided that the following conditions are met:
    *
    * 1. Redistributions of source code must retain the above copyright notice, this
    * list of conditions and the following disclaimer.
    *
    * 2. Redistributions in binary form, except as embedded into a Nordic
    * Semiconductor ASA integrated circuit in a product or a software update for
    * such product, must reproduce the above copyright notice, this list of
    * conditions and the following disclaimer in the documentation and/or other
    * materials provided with the distribution.
    *
    * 3. Neither the name of Nordic Semiconductor ASA nor the names of its
    * contributors may be used to endorse or promote products derived from this
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX