This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Libuarte_async bug(s) - missing data / wrong buffer returned.

Hi,

We've been having problems trying to implement a high-speed uart communication between the nRF52832 and an STM32.

We switched to using libuarte so we can use DMA even though our communication protocol has variable-sized packets Unfortunately, after several seconds of working perfectly, we suddenly start seeing errors. In between good new data blocks, we suddenly have one block that has old data. So either the data was never written by DMA, or it was written somewhere other than we are told to read from. It was only for the amount of data that the NRF_LIBUARTE_ASYNC_EVT_RX_DATA reported. It seemed kinda like the timeout triggered before the data was actually copied by the DMA or something along those lines.

We've reduced the transmission speed to 115200, and the error still occurs. We've used a logic analyzer to check if the data is really being transferred - it definitely is, but the nordic is not getting it. We've tried it with SDK 15.0, 15.3 and 16.0, the problem remains.

Since our codebase is way too complex to try and post something helpful here, we tried recreating the issue with two nRF52840 DKs.

Unfortunately, we're not getting the same errors, but are getting another error in the first RX event. First, right after we initialize libuarte_async we get NRF_LIBUARTE_ASYNC_EVT_ERROR errorSrc 0 - this started with SDK 16.0, it wasn't occurring when using SDK 15.x with ported libuarte. Then we get a bunch of skipped bytes reported (check the attached project):

<info> app: Rx 128@x20004B84
<error> app: RX: expected x65, got x39 instead
<error> app: RX: expected x3A, got x3D instead
<error> app: RX: expected x3E, got x41 instead
<error> app: RX: expected x42, got x45 instead
<error> app: RX: expected x46, got x49 instead
<error> app: RX: expected x4A, got x4D instead

Note: We're using arm-none-eabi-gcc  v4.9.3 to build it the test project, targeting nrf52840 chip sitting on development kits. They are connected and both running the same firmware, both doing RX and TX using libuarte.

ble_app_libuarte_test.tar.gz

  • Yes, this is quite possible. While testing, we did see some cases where the data was transferred in smaller blocks:

    Rx 17 -> 955
    Rx 13 -> 972
    Rx 16 -> 985
    Rx 16 -> 1001
    Rx 7 -> 1017
    Rx 6 -> 0
    Rx 20 -> 6
    Rx 22 -> 26
    Rx 15 -> 48
    Rx 21 -> 63
    Rx 17 -> 84
    Rx 13 -> 101
    Rx 14 -> 114
    Rx 11 -> 128
    Rx 13 -> 139
    Rx 13 -> 152
    Rx 13 -> 165
    Rx 15 -> 178
    Rx 25 -> 193
    Rx 13 -> 218
    Rx 13 -> 231
    Rx 12 -> 244
    Rx 3 -> 256
    Rx 18 -> 259
    Rx 17 -> 277
    Rx 16 -> 294
    Rx 17 -> 310
    Rx 15 -> 327

    Even though one packet that we transmit is 255 bytes. 
    We are transmitting everything available using DMA on the STM as well, so this case only occurs while our firmware is filling the TX buffer at exactly the same time that DMA is transferring the data. Since we fill the buffer with data we are getting from USB, it is possible that there interrupts are delayed, and there are pauses between those small 3-25 byte chunks of data, and there is a chance they're more than 100us. 

    So yes, it's quite possible that there are almost back to back timeouts of 100us, but I think that libuarte should be able to handle that too, even if both back to back timeout handlers are delayed. 
    Lets just take the first three chunks, and say there were 150us delay in between each pair, and that the timeout handler only started in the middle of the 16 byte chunk. 

    I think the counter should just keep counting, so when the handler reads it, it would be at 17+13+8=28 bytes. This is what it should tell us in the callback, we copy that data, and tell it to free it. So it moves the pointer by 28 bytes. Immediately following the second handler is called, and reads the counter which is still at 28 or maybe 29 bytes by now, so it returns 0 or 1 byte, then frees that 1 byte and exits. 

    I don't see a reason why the second interrupt handler should somehow fail...

Related