How to wake UART from System-OFF without the use of RDY and REQ pins (or any sort of Hardware Flow Control), for an nRF54L15?

Issue

How to properly wake an nRF54L15 from System-Off mode when the device receives a UART transmission, without the use of the LPUART Ready and Request lines or any sort of HWFC.

I understand that System-Off fully powers down the UART so we can't rely on the UART and DMA hardware to capture the input for us. Additionally, the UART transmission is not using any form of flow control futher complicating the process.

We are looking for any recomendations and/or pointers to help resolve this particular problem. It seems as though if the hardware solution turns out to be a bust, the only thing we can do to replace the 3rd party chip upstream of us with our own but we would like to avoid that as much as possible.

Exact Constraints

- Device cannot consume more than 1 \[mA] while waiting for a transmission - our power profiler readings suggest system ON idle consumes around 7mA.
- Device needs to maintain System-Off as often as possible to save power.
- Upstream UART producer is a 3rd party device we would like to avoid replacing.
- UART Transmission:
- 9600 Baudrate
- 1 Stop bit
- No parity
- Constant pre-amble of ~8 bytes that we do not care about.
- Length of ~32 bytes

What have we attempted/explored

Maintaing System-On in idle

This leaves the UART and DMA fully intact and allows it to immediately capture the data burst without needed to wake the CPU.

The core issue with this solution is that the power consumption from leaving the device in system-on idle is way too high.

System-Off with RX shorted to a GPIO

In this solution, we shorted the RX pin of the device's UART to another GPIO set to ACTIVE LOW with SENSE enabled. Since UART always holds the RX pin high until it is ready to transmit, this works perfectly for signaling the device to wake up.

The issue we run into here is that waking from System-Off is essentially the same as rebooting the device. This means that the UART interface and any configuration for it is lost and does not exist at the moment the system starts to reboot. By the time the Zephyr kernel is loaded the data burst is already over.

System-Off with RX shorted to a GPIO (PRE-KERNEL INIT)

Building off the last solution, we placed an init function for the UART before the Kernel starts to load at the ``PRE_KERNEL_1`` entry point. This successfully initializes the UART early enough to begin capturing the data burst however it starts ~6 bytes into the transmission.

The issue with this approach is that the UART protocol does not define a way to start capturing in the middle of a data burst (transmission). If the first edge (hi->lo) the hardware sees is not the start bit of that byte, that byte will be corrupted and will either cause the following bytes to become corrupted or trigger a corrupted frame error stopping capture all together.

This essentially means that even though we managed to wake up fast enough to "see" the transmission, we have no way of reliably understanding what is actually being sent.

Potential Hardware Solution

We haven't fully explored this yet but the basic idea is to place our own UART periphereal with a FIFO right infront of the device. The RX line is again shorted to a GPIO on the device to enable wakeup on a transmission. The external UART and FIFO would always be powered allowing it to capture the transmission and store it in the FIFO for the device to retrieve once it is ready.

In the initial numbers we ran for this, it seems to be plausible but completely depends on if we can find a UART and FIFO chip that doesn't violate our power budget (1 [mA]).

GPIO Sampling

As of writing this, we just thought of another potential solution of using the ``PRE_KERNEL_1`` entry point to setup a GPIO pin that samples the data at the 9600 baudrate dumping what it sees into a bitstream. Since we know the UART configuration, we could in theory work backwards from the last stop bit seen and mark out each byte correctly. This completely circumnavigates the UART hardware issues entirely but relies on us being able to reasonably sample the GPIO at as close to 9600 baud as possible.

Device Setup

Currently we are testing with a nRF54L15 that is housed on a custom PCB with a PMIC (NPM2100) using its boost converter and a coin-cell battery to power the device.

SDK Version: ncs-V3.2.1



All advice, support, and feedback is greatly appreciated! 

Thank you Devzone team :)

  • In short, there is no way to capture the first byte(s) sent from the target device if the SoC is in System off. This is because the UART is not active in this mode.

    The expected current for sleeping while waiting for UART is around 130uA, as the system must be in System on all idle with the UART RX enabled. This is the "normal", though pretty power hungry way to wait for UART data.

    System on all idle can achieve around 5uA if the UART is not enabled, the wakeup time is around 7us. System off acheives way lower, with a startup time of aroung 60us. If you are okay with loosing the first UART byte, you could either:

    If system on all idle, Deinit the UART, configure the RX pin as gpio interrupt, wait for interrupt, immediately reinit the UART.

    If System off, disable the UART, configure RX pin as GPIO wakeup, on wakeup, configure UART (should be automatic since system is rebooting)

    At 9600 baudrate, the first byte will take around (1 / 9600) * 11 = 1.1ms so the SoC has plenty of time to wake up before the second byte.

    If its imperitive that the first byte is not lost, best case is still those 130uA.

    Regarding the external UART with FIFO, that could surely work if you find a particularly low power one. But that is of course quite costly, and will use more current while active compared to using the internal UART given you now have the SoC + some bus + external UART enabled while talking to your target device.

  • Preambles are usually known and could therefore be used to correlate (slide against) an otherwise unknown bit stream. However a timer-based Rx might be worth investigating as that would require less power than a UART methinks. I tested this approach a while back; it works well.

    Single bit sample, dual timers (T3 and T4), no reload:

    // Standard Uart character 0x21 - Single sample, dual timers (T3 and T4), no reload
    //
    //      |     |     |     |     |     |     |     |     |     |     | Bit Framing
    //      |     |     |     |     |     |     |     |     |     |     |
    // Idle 'Start'Bit 0'Bit 1'Bit 2'Bit 3'Bit 4'Bit 5'Bit 6'Bit 7'Stop '  Idle
    // -----+     +-----+                       +-----+           +-----+-------
    //      |     |     |                       |     |           |
    //      |     |     |                       |     |           |       Normal
    //      +-----+     +-----+-----+-----+-----+     +-----+-----+
    //
    //      +-----+     +-----+-----+-----+-----+     +-----+-----+
    //      |     |     |                       |     |           |       Invert
    //      |     |     |                       |     |           |
    // -----+     +-----+                       +-----+           +-----+-------
    //      |     |     |     |     |     |     |     |     |     |     | Bit Framing
    //         |     |     |     |     |     |     |     |     |     |                                          Clear
    //         |     |     |     |     |     |     |     |     |     +--- Stop bit  pTimer4->CC[5] = BT/2+(9*BT)  *
    //         |     |     |     |     |     |     |     |     +--------- Bit 7     pTimer4->CC[4] = BT/2+(8*BT)  -
    //         |     |     |     |     |     |     |     +--------------- Bit 6     pTimer4->CC[3] = BT/2+(7*BT)  -
    //         |     |     |     |     |     |     +--------------------- Bit 5     pTimer4->CC[2] = BT/2+(6*BT)  -
    //         |     |     |     |     |     +--------------------------- Bit 4     pTimer4->CC[1] = BT/2+(5*BT)  -
    //         |     |     |     |     +--------------------------------- Bit 3     pTimer4->CC[0] = BT/2+(4*BT)  -
    //         |     |     |     +--------------------------------------- Bit 2     pTimer3->CC[3] = BT/2+(3*BT)  -
    //         |     |     +--------------------------------------------- Bit 1     pTimer3->CC[2] = BT/2+(2*BT)  -
    //         |     +--------------------------------------------------- Bit 0     pTimer3->CC[1] = BT/2+(1*BT)  -
    //         +--------------------------------------------------------- Start Bit pTimer3->CC[0] = BT/2         -
    

    Single bit sample, single timer (T4), requires reload:

    // Standard Uart character 0x21 - Single sample, single timer (T4), requires reload
    //
    //      |     |     |     |     |     |     |     |     |     |     | Bit Framing
    //      |     |     |     |     |     |     |     |     |     |     |
    // Idle 'Start'Bit 0'Bit 1'Bit 2'Bit 3'Bit 4'Bit 5'Bit 6'Bit 7'Stop '  Idle
    // -----+     +-----+                       +-----+           +-----+-------
    //      |     |     |                       |     |           |
    //      |     |     |                       |     |           |       Normal
    //      +-----+     +-----+-----+-----+-----+     +-----+-----+
    //
    //      +-----+     +-----+-----+-----+-----+     +-----+-----+
    //      |     |     |                       |     |           |       Invert
    //      |     |     |                       |     |           |
    // -----+     +-----+                       +-----+           +-----+-------
    //      |     |     |     |     |     |     |     |     |     |     |
    //      |     |     |     |     |     |     |     |     |     |     | Bit Framing
    //      |     |     |     |     |     |     |     |     |     |     |
    //         |     |     |     |     |     |     |     |     |     |                                          Clear Reload
    //         |     |     |     |     |     |     |     |     |     +--- Stop bit  pTimer4->CC[3] = BT/2+(9*BT) *      2,3,0
    //         |     |     |     |     |     |     |     |     +--------- Bit 7     pTimer4->CC[2] = BT/2+(8*BT) -      1
    //         |     |     |     |     |     |     |     +--------------- Bit 6     pTimer4->CC[1] = BT/2+(7*BT) -      0
    //         |     |     |     |     |     |     +--------------------- Bit 5     pTimer4->CC[0] = BT/2+(6*BT) -      -
    //         |     |     |     |     |     +--------------------------- Bit 4     pTimer4->CC[5] = BT/2+(5*BT) *      -
    //         |     |     |     |     +--------------------------------- Bit 3     pTimer4->CC[4] = BT/2+(4*BT) -      -
    //         |     |     |     +--------------------------------------- Bit 2     pTimer4->CC[3] = BT/2+(3*BT) -      2
    //         |     |     +--------------------------------------------- Bit 1     pTimer4->CC[2] = BT/2+(2*BT) -      1
    //         |     +--------------------------------------------------- Bit 0     pTimer4->CC[1] = BT/2+(1*BT) -      0
    //         +--------------------------------------------------------- Start Bit pTimer4->CC[0] = BT/2        -      -
    

    More robust (better noise immunity) 3-vote bit samples, single timer (T4), SBT=BT/3, requires reload:

    // Standard Uart character 0x21 - 3-vote samples, single timer (T4), SBT=BT/3, requires reload
    //
    //      |     |     |     |     |     |     |     |     |     |     | Bit Framing
    //      |     |     |     |     |     |     |     |     |     |     |
    // Idle 'Start'Bit 0'Bit 1'Bit 2'Bit 3'Bit 4'Bit 5'Bit 6'Bit 7'Stop '  Idle
    // -----+     +-----+                       +-----+           +-----+-------
    //      |     |     |                       |     |           |
    //      |     |     |                       |     |           |       Normal
    //      +-----+     +-----+-----+-----+-----+     +-----+-----+
    //
    //      +-----+     +-----+-----+-----+-----+     +-----+-----+
    //      |     |     |                       |     |           |       Invert
    //      |     |     |                       |     |           |
    // -----+     +-----+                       +-----+           +-----+-------
    //      |     |     |     |     |     |     |     |     |     |     |
    //      |     |     |     |     |     |     |     |     |     |     | Bit Framing
    //      |     |     |     |     |     |     |     |     |     |     |
    //       | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |                                                  Clear Reload
    //       | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +- Stop Bit  after 1/2 Stop bit wait for next Start  -     -
    //       | | | | | | | | | | | | | | | | | | | | | | | | | | | | +--- Stop Bit  Check  pTimer4->CC[4] = SBT/2+(28*SBT)  *     3,4,0
    //       | | | | | | | | | | | | | | | | | | | | | | | | | | | +----- Stop Bit  Check  pTimer4->CC[3] = SBT/2+(27*SBT)  -     2
    //       | | | | | | | | | | | | | | | | | | | | | | | | | | +------- Bit 7     Vote 3 pTimer4->CC[2] = SBT/2+(26*SBT)  -     1
    //       | | | | | | | | | | | | | | | | | | | | | | | | | +--------- Bit 7     Vote 2 pTimer4->CC[1] = SBT/2+(25*SBT)  -     0
    //       | | | | | | | | | | | | | | | | | | | | | | | | +----------- Bit 7     Vote 1 pTimer4->CC[0] = SBT/2+(24*SBT)  -     5
    //       | | | | | | | | |                                              etc for pairs of bits
    //       | | | | | | | | +------------------------------------------- Bit 1     Vote 3 pTimer4->CC[2] = SBT/2+(8*SBT)   -     1
    //       | | | | | | | +--------------------------------------------- Bit 1     Vote 2 pTimer4->CC[1] = SBT/2+(7*SBT)   -     0
    //       | | | | | | +----------------------------------------------- Bit 1     Vote 1 pTimer4->CC[0] = SBT/2+(6*SBT)   -     5
    //       | | | | | +------------------------------------------------- Bit 0     Vote 3 pTimer4->CC[5] = SBT/2+(5*SBT)   *     4
    //       | | | | +--------------------------------------------------- Bit 0     Vote 2 pTimer4->CC[4] = SBT/2+(4*SBT)   -     3
    //       | | | +----------------------------------------------------- Bit 0     Vote 1 pTimer4->CC[3] = SBT/2+(3*SBT)   -     2
    //       | | +------------------------------------------------------- Start Bit Vote 3 pTimer4->CC[2] = SBT/2+(2*SBT)   -     1
    //       | +--------------------------------------------------------- Start Bit Vote 2 pTimer4->CC[1] = SBT/2+(1*SBT)   -     0
    //       +----------------------------------------------------------- Start Bit Vote 1 pTimer4->CC[0] = SBT/2           -     -
    

  • The problem with this is that UART hardware can't tell the difference between a first true start bit edge (hi->lo) and just some random data bit flipping the RX line. If the UART peripheral capture triggers on a data bit pulling RX down and not the start bit, the capture frame is fundamentally corrupted and the hardware has no way to recover from that. Any data the UART pushed to the DMA (if it even manages to get a valid frame by chance) will be corrupted.

    I could be miss-understanding the inner workings of UART but from my testing this is exactly what is happening. I can wake the UART up fast enough to begin capturing part of the preamble but the received data is sometimes correct, other times completely garbage.

    Just to address the timings you mentioned, from my testing I found that POST_KERNEL takes roughly 11 [ms] to reach. This is what led to moving the initialization to PRE_KERNEL_1 which is sufficiently early to start the capture during the preamble window.

    You mentioned an interesting point about System-On where disabling the UART reduces its power draw. I think you are completely correct with that. We just did some more testing of the power draw in System-On mode and it is a lot lower than our previous results managing to maintain a idle draw in the 6 [uA] region. I will try some more tests with exactly this just to verify.

    Building on the System-On solution some more, setting the GPIO pin to trigger the DPPI to bring up the UART and DMA could potentially work and allow us to capture immediately. This may allow us to get past the hardware limitation (I could be wrong about there even being one).

    Thanks for the reply and hopefully we'll having some more information soon here.

  • Exactly! This is where my mind was going when we thought up just sampling the GPIO at the baudrate. We can by-pass the UART hardware issues and just decode the data ourselves to get the transmission back. Then we can orient the data by finding which bytes correspond to the preamble. Since I know we can get a UART up early enough in System-Off to capture part of the preamble, a GPIO pins is even more trivial.

    Interesting idea to sample even more frequently than needed to help reduce noise. I think this is the most ideal solution as it offers the most control over the physical data and prevents a poorly timed UART init from corrupting the data.

    Thanks for the reply and the diagrams explaining it in more detail. I'll make sure to read into this a bit more too.

Related