Framing Errors seen on nRF54L15 UART. Is it super sensitive to stop bit timing?

Hi,

I am getting framing error notifications from the nrfx uarte driver on a zephyr platform when receiving data from a certain external device, but only when there are large bursts of data.

The data is at 115200bps, 8 bit, no parity, so not particularly fast.

The device I am receiving from has very tight timing (the stop bit is exactly one bit time well within tolerance), and often sends large (multiple hundreds of bytes) streams with no idle time between stop bit and following start bit. This device cannot have its serial settings modified (to increase the number of stop bits to 2 for example).

Looking at the data with both an oscilloscope and a very oversampled logic analyser (which will report formatting problems), the actual data on the line seems fine, but I can't be sure that it is consistent.

When I look at the data received from the driver, I generally just see missing bytes, rather than bad data (with a few exceptions), which seem to come from the data being dropped when the driver cancels the rx, and I suspect that I can't re-enable it fast enough.

As far as I can tell, I have the external oscillator enabled, and constant latency mode enabled, but I am not sure if there is any interplay with the nordic/zephyr code that may be altering this.

I am using the zephyr async API, with a slab buffer with 8 128 byte receive buffers handled, which seems to me to be the highest performing approach.

I have previously seen some UARTs which required ever so slightly more than a single bit time for the STOP before a new START bit, with very similar symptoms to what I have seen here.

My questions are:

Is the baud rate divisor in SDK 2.9.1 for 115200 bps still correct? I have noticed that there are forum posts that state that this shouldn't be modified, but also that it doesn't line up with the formula also reported on the forums. Is the special adjusted baud rate divisor only for running from internal RC oscillator (and therefore tuned to it)?
Is there a surefire way to disable any change on clock source/latency settings so that I can be sure that I am testing under the correct conditions?

My concern is that effectively the same code was talking to a cellular modem at a much high baud rate with total reliability, suggesting to me that the UARTE hardware might be a bit fussy about stop bit length.

Regards,

Nathan Boyd

Top Replies

Parents

0 Håkon Alseth 2 months ago

Hi,

Is the baud rate divisor in SDK 2.9.1 for 115200 bps still correct? I have noticed that there are forum posts that state that this shouldn't be modified, but also that it doesn't line up with the formula also reported on the forums. Is the special adjusted baud rate divisor only for running from internal RC oscillator (and therefore tuned to it)?

divisor is unchanged for 115k2, but there has been issues related to uarte2x, and running this on cross-powered domains, ie. crossing it into GPIO domain P1, where you are required to use constant latency (CONSTLAT).

Is there a surefire way to disable any change on clock source/latency settings so that I can be sure that I am testing under the correct conditions?

request the HFXO (like done here https://github.com/nrfconnect/sdk-nrf/blob/main/samples/peripheral/radio_test/src/main.c#L28-L48), and the baud should be quite accurate (~100 Hz off: https://docs.nordicsemi.com/bundle/ps_nrf54L15/page/uarte.html#ariaid-title75).

and constant latency mode enabled

Are you using nrfx_power_constlat_mode_request to request this? This is important if you are using BLE, as this will ensure that the SoftDevice does not disable constlat.

Have you tried to use hw-flowcontrol?

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Nathan Boyd 2 months ago in reply to Håkon Alseth
Hi Håkon,

I didn't think was running cross domain. I am using uarte21, with pins on port 1 (both on the PERI PD). We were previously using a cross-domain setup, and thought that was the root of our issues.

So is UARTE21 not on the same domain as GPIO P1? I was checking against Figure 1 in the datasheet.

I am currently using nrfx_clock_start(NRF_CLOCK_DOMAIN_HFCLK); I will compare to the example to see if it is any different.

I am using nrfx_power_constlat_mode_request to enable constant latency. After my post, I found that someone reported that the CONSTLATSTAT register def was wrong, which explains why I wasn't confident that the setting was being done.

RE: Flow control. Unfortunately we can't use flow control, as the attached device does not support it. The protocol does have CRC protection, so we can detect bad data, but because it doesn't use byte stuffing or bit stuffing, re-syncing to a valid start of packet is troublesome.

If it helps, my setup is:

Using the nRF54L15dk board, connected via the pin headers to replace a Fanstel BM15 module on our own board (due to Fanstel using an out-of-spec XTAL).

The UART we are using is UARTE21. Pins are P1.13 for TX, P1.14 for RX.

We are setting constant latency.

I _think_ we are correctly using the external XTAL for the HFCLCK.

The clock setup on the nRF54L15 seems quite different to the nRF5340 (which we previously prototyped with), notably there seems to be no explicit setting to select the source of the HFCLK between the internal RC and the external XTAL.

Am I correct in understanding that simply starting the XO and PLL in CLOCK is enough to force the use of the external XTAL?

I will study the example provided and let you know if it helps.

Regards,

Nathan Boyd.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Håkon Alseth 2 months ago in reply to Nathan Boyd
Hi Nathan,

Nathan Boyd said:
I didn't think was running cross domain. I am using uarte21, with pins on port 1 (both on the PERI PD). We were previously using a cross-domain setup, and thought that was the root of our issues.

So is UARTE21 not on the same domain as GPIO P1? I was checking against Figure 1 in the datasheet.

So right you are. My deepest apologies, I have written incorrectly. You are ofcourse free to use any available GPIO on P1, but UART2x peripheral has dedicated GPIOs on P2:

https://docs.nordicsemi.com/bundle/ps_nrf54L15/page/uarte.html#d1900e814

Sorry for the mix-up on my side.

Nathan Boyd said:
I am currently using nrfx_clock_start(NRF_CLOCK_DOMAIN_HFCLK); I will compare to the example to see if it is any different.

I am using nrfx_power_constlat_mode_request to enable constant latency. After my post, I found that someone reported that the CONSTLATSTAT register def was wrong, which explains why I wasn't confident that the setting was being done.

This sounds correct.

nrfx_clock_start will route to either the driver, or the MPSL implementation (when MPSL is included).

Please note that nrfx_clock handles the PLL start/stop for nRF54L15 with ncs v3.0.0 (and newer):

https://github.com/zephyrproject-rtos/hal_nordic/blob/master/nrfx/drivers/src/nrfx_clock.c#L490

Nathan Boyd said:
The clock setup on the nRF54L15 seems quite different to the nRF5340 (which we previously prototyped with), notably there seems to be no explicit setting to select the source of the HFCLK between the internal RC and the external XTAL.

Am I correct in understanding that simply starting the XO and PLL in CLOCK is enough to force the use of the external XTAL?

The intention of the hardware is that the request of the CLOCK.PLL functionality shall be more automated, as described here:

https://docs.nordicsemi.com/bundle/ps_nrf54L15/page/clock.html#ariaid-title2

However, there are two erratum, #20 and #39, that needs to be handled:

https://docs.nordicsemi.com/bundle/errata_nRF54L15_Rev1/page/ERR/nRF54L15/Rev1/latest/anomaly_L15_39.html

https://docs.nordicsemi.com/bundle/errata_nRF54L15_Rev1/page/ERR/nRF54L15/Rev1/latest/anomaly_L15_20.html

And I fully understand that this is complicated as compared to former nRF5-series devices.

Let me know if anything is unclear.

I believe your description here:

which seem to come from the data being dropped when the driver cancels the rx, and I suspect that I can't re-enable it fast enough.

Is crucial to the behavior. Is the scenario in such a way that the rx buffer on the nRF side is less than the max-size your companion host sends? What I am asking is if there is a pattern to when the application misses some bytes.

Are you using "CONFIG_UART_NRFX_UARTE_ENHANCED_RX"?

Here's the intention behind adding this enhanced rx mode: https://github.com/zephyrproject-rtos/zephyr/commit/399a235653518bc9b65c083f2e70cefa943aa4d9

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Nathan Boyd 2 months ago in reply to Håkon Alseth
Thanks Hakon.

I am aware of CONFIG_UART_NRFX_UARTE_ENHANCED_RX, and it seems to be set correctly by default.

To be clear, I don't seem to lose any characters simply by being swamped, and the RX timeout seems to be handled correctly.

The problem seems to occur when there is a frame error, which results in the Nordic driver disabling RX, which then has to be re-enabled in the application (which is using the async driver interface). This means that not only a single "bad" character is lost, but potentially some others. The problem really seems to be in two parts:

The UART detects a framing error when I don't expect one to be seen. In all my measurements so far, the sending device has good bit timings, but could possibly be a either a little short on the stop bit, or at least bang on with no extra delay. My current method for measuring this isn't accurate enough to be totally sure that the timing is always good (the logic analyser can sample at 8Mhz, which sometimes results in bit timings looking a little off, and small variances can't be seen, and I haven't caught a short bit with the oscilloscope yet.

When the framing error is detected, I seem to lose more than just the affected byte, presumably because the current RX is stopped, and it takes some time to restart it.

This is compounded by the protocol we are listening to being difficult to resync to with missing or corrupted characters, and the packet timing being quite variable under some circumstances.

I have done some more testing since my last post, and I may also be seeing corrupted characters, but it is hard for me to tell.

My concerns are:

The Nordic UART may not like a stream of bytes that are closely spaced, and may not detect the following start bit (either resyncing a little bit late, or missing that bit and then generating a framing error for the following byte). I can't quite tie it down, but it does look like the problem may show up when there are a lot of closely spaced bytes being received. I have seen such a situation below in some older UART hardware (I think it was the UART on an NXP ARM7 device), that meant that it actually required just a hair more than one bit time for a stop bit, otherwise the following start bit could be lost. This was only seen when receiving from a device that had no delays (many devices I tested at that time had significant gaps (ie multiple bit times) between transmitted bytes even when transmitted from a FIFO or via DMA. This caused a problem when a new peripheral was added to a product which had gapless UART transmissions, causing the NXP device to drop many bytes.

The disabling of RX on reception of an error by the Nordic UART driver somewhat thwarts attempts to resync the protocol.

The device we are listening to has vary variable behaviour it packet timings. It is a UHF RFID scanning module that has a stated behaviour of scanning to RFID tags for about 400ms, then reporting detected tags during an approx 100ms period, then scanning again. The behaviour is sometimes inconsistent:

When the UHF scanner module doesn't detect many tags, each packet is slightly spaced, and each byte in the packet follows immediately (i.e. the stop bit is immediately followed by a start bit, with no idle time).

When the UHF scanner is busier (more tags are detected), it seems to also transmit on the UART outside the 100ms period and the following is observed:

Some packets have no idle time.

Some packets are split into two pieces, with a large (10s of bytes worth) idle time between them. This usually has 2 or 3 of the bytes in one packet, with the rest in another.

The variability in the output from the scanner module makes it hard to reproduce faults, but when I tap off the RX signal and capture it on both a PC UART and logic analyzer, all data is reported good (but I am not sure if the logic analyzer cares if the stop bit is full sized).

When we first saw this problem on our nRF54L15 platform, I noticed that we were unadvisedly using UART30 (low power domain) on Port 2 (High Performance domain), and assumed that the detected framing errors were due to propagation time issues with that setup.

I don't believe I have seen the issue with long packets without idle time between bytes, but I am trying to force the issue by generating worse case data based on what timings I have seen from the scanner. The longest individual packet is only around 80 bytes, under the size of a single receive buffer, but because of the long delays chopping up packets when the scanner is busy, I suspect that the problem may occur in situations where the RX timeout comes into play.

If you can confirm that there is no likely issue with the UART receiving a large number of bytes without gaps between bytes (i.e. with start bit immediately following stop bit), either due to sync drifting, or performing a stop bit sample and check past the middle sample period of the stop bit, then I can pretty much write off the Nordic side and concentrate on the vendor of UHF scanning module.

It just seems suspicious that all the PC/USB UART and the logic analyzer are both receiving the data correctly without framing errors.

Regards,

Nathan Boyd
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Nathan Boyd 2 months ago in reply to Nathan Boyd

Hi Hakon,

I am currently working on making an accurate signal capture so I can point out the timing of the byte sequences that trigger the framing error, so we can look at if there is any glitching or level differences that the logic analyzer couldn't show.

There probably isn't much point looking too deep into my previous reply until I can give you some hard data to reproduce the issue.

Regards,

Nathan Boyd.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Nathan Boyd 2 months ago in reply to Nathan Boyd

Hi Hakon,

I am currently working on making an accurate signal capture so I can point out the timing of the byte sequences that trigger the framing error, so we can look at if there is any glitching or level differences that the logic analyzer couldn't show.

There probably isn't much point looking too deep into my previous reply until I can give you some hard data to reproduce the issue.

Regards,

Nathan Boyd.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Håkon Alseth 1 month ago in reply to Nathan Boyd

Hi Nathan,

Nathan Boyd said:
To be clear, I don't seem to lose any characters simply by being swamped, and the RX timeout seems to be handled correctly.

The problem seems to occur when there is a frame error, which results in the Nordic driver disabling RX, which then has to be re-enabled in the application (which is using the async driver interface). This means that not only a single "bad" character is lost, but potentially some others. The problem really seems to be in two parts:

Thank you for clarifying, this was very helpful for me to understand.

I believe that the issue is directly related to this function call in our driver:

https://github.com/nrfconnect/sdk-zephyr/blob/v3.7.99-ncs3/drivers/serial/uart_nrfx_uarte.c#L1408

Right now, we stop the UART RX if a ERRORSRC event occurs, and then have to follow the whole sequence outlined in the zephyr uart.h API:

https://github.com/zephyrproject-rtos/zephyr/blob/v4.0.0/include/zephyr/drivers/uart.h#L188-L191

And this causes a larger delay for when you're able to re-enable the uart receiver in the application space.

For testing purposes:

If you comment out the disable line(uart_nrfx_uarte.c, line 1408), are you able to see improvements on the overall serial communication?

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Nathan Boyd 21 days ago in reply to Håkon Alseth

Hi Hakon,

I have tried removing the disable line (without noticable improvement), but not in conjunction with some other changes. The project will be put on hold soon, so my investigation will be done in my own time.

If I find anything I will let you know.

Regards,

Nathan Boyd.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel