This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Very rare error in ESB, Enhanced ShockBurst

Hi

One of our customers have complained, that our sensors are making occasional errors. 

I have looked at it, and they are correct. About 1 in 8 Million packets send by ESB are received in error.

The CRC check is set to 2 bytes and the address is set to 5 bytes. So I think it is as good as it can be for the nRF24LE1 (it is not getting better with nRF5, right?)

In the attachment you can see some examples of packets that have been received in error. At this moment the transmitter is sending counting data non-stop, to detect the problem.

The it is as if, the ESB of PTX re-transmits a single packet, that has already been received.

In about 24 hours of transmission there was 3 errors. During this time the TX FIFO is normally not used. The firmware just uploads a new packet (32 bytes) every 15ms, and therefore in line of sight the TX FIFO has fully finished transmitting one packet and received the ACK packet before the firmware uploads a new packet to the TX FIFP. For this latest 24 hour test the TX FIFO was only used about 0.8% of the time, but the 3 errors all occurred here. The errors came in hours apart, but only when the ESB was catching up after transmission interruption. The tests where conducted in a very WiFi/Bluetooth/etc noisy environment.

The firmware uploads the packets with this routine:

RADIO_INTERRUPT_DISABLE;

            while (nrf_tx_fifo_full() == 0 && ram_full > 0)
            {
                write_tx_payload(ram_packet[ram_read], RF_PAYLOAD_LENGTH_DATA);
                ram_read = (ram_read + 1) % BUFFERS;
                ram_full = ram_full - 1;
            }
RADIO_INTERRUPT_ENABLE;

...

//Hi speed version of nrf_write_tx_payload
void write_tx_payload(uint8_t *tmp, uint8_t length)
{
    uint8_t i;
    NRF_CSN_LOW;

    SPIRDAT = W_TX_PAYLOAD;

    while (length--)
    {
        while (NRF_SPI_RX_FULL);
        SPIRDAT = *(tmp++);
        i = SPIRDAT;
    }

    while ( NRF_SPI_TX_EMPTY == 0 );  // wait for byte transfer finished

    i = SPIRDAT;
    NRF_CSN_HIGH;
}

...

At about 500us after the TX FIFO has been filled up, the transmission is started:

void startTransmit(uint8_t i)
{
    if (i == (id +1) ) {
        // MAX_RT flags has to be cleared, to re-start transmission.
        nrf_get_clear_irq_flags();
        NRF_CE_PULSE();
    }
}

Do you have any idea why this can happen?

er_understanding.xlsx

  • Hi,

     

    The CRC check is set to 2 bytes and the address is set to 5 bytes. So I think it is as good as it can be for the nRF24LE1 (it is not getting better with nRF5, right?)

    Hardware-wise, the nRF5 devices can do 3 byte CRC, but the "nrf_esb" library is made for backwards compatibility, so it does not have support for 3 byte crc.

     

    In about 24 hours of transmission there was 3 errors. During this time the TX FIFO is normally not used. The firmware just uploads a new packet (32 bytes) every 15ms, and therefore in line of sight the TX FIFO has fully finished transmitting one packet and received the ACK packet before the firmware uploads a new packet to the TX FIFP. For this latest 24 hour test the TX FIFO was only used about 0.8% of the time, but the 3 errors all occurred here. The errors came in hours apart, but only when the ESB was catching up after transmission interruption. The tests where conducted in a very WiFi/Bluetooth/etc noisy environment.

     Looking at the packets, it looks like most of them are corrupted payloads that by chance has passed the CRC detection. How does your RX readout routine look like?

    I would strongly recommend that you always check the length of the payload prior to reading from the RX_FIFO.

    You can read the length using command "R_RX_PL_WID" (hal_nrf.c::hal_nrf_read_rx_payload_width()).

     

    Ensure the payload length is not 0 and not larger than 32 bytes:

    len = hal_nrf_read_rx_payload_width();
    
    if (len == 0 || len > 32)
        flush_rx_fifo();
    else {
        /* Proceed as normal */
    }

     

    Could you also check if there are any data overlays that needs to be handled in your compiler? I assume you use Keil C51?

    You'll need warning L15 enabled in order to see the potential functions that overlay in memory. If you get any L15 warnings when you recompile (rebuild the whole project, not incremental build), please post them here.

     

    Kind regards,

    Håkon

  • Hi Håkon,

    Thanks for you mail.

    Hardware-wise, the nRF5 devices can do 3 byte CRC, but the "nrf_esb" library is made for backwards compatibility, so it does not have support for 3 byte crc.

    That is good. I hope I will be able to hack the nrf_esb to also support 3 bytes.

    Yes, the payload length is always checked. I know, that it would cause many more errors otherwise.

    I don't use Keil. It is SDCC.

    The readout is combining reading out from the radio with writing to the SPI (which is 8-SPI) for very high speed.

    void SPI_TransferRadioPacket(uint8_t pipe)
    {
       uint8_t i =0;
       uint8_t length;// Number of byte in the radio packet.

       length = nrf_read_reg(R_RX_PL_WID);

       if (length != 32) {        
            // If the length of the packet is different from 32 delete it.
            nrf_flush_rx();
        }
        else
        {    
            NRF_CSN_LOW; // RADIO SPI START
            SPIRDAT = R_RX_PAYLOAD;// Read Received Payload
            SPIRDAT = 0;// Putting in the 2'nd byte
            while (NRF_SPI_RX_READY == 0) ;// wait for tx and rx of radio SPI
            SPIRDAT;// Dummy read from R_RX_PAYLOAD
            SPIRDAT = 0;// Putting in the 2'nd byte
            
            length = length - 2;
            while (length--)
            {
                while (NRF_SPI_RX_READY == 0) ;// wait for radio SPI
                SPI_writeByte(SPIRDAT);
                SPIRDAT = 0;
        
            }
            // Get the last bytes out without putting in a new SPIRDAT = 0;
            while (NRF_SPI_RX_READY == 0) ;// wait for tx and rx of radio SPI
            SPI_writeByteWithTimer(SPIRDAT);
            while (NRF_SPI_RX_READY == 0) ;// wait for tx and rx of radio SPI
            SPI_writeByteWithTimer(SPIRDAT);
            NRF_CSN_HIGH; // RADIO SPI STOP
        }
    }

  • Hi,

     

    I have to admit that my sdcc knowledge have deteriorated over the years, but I believe it has the same principles wrt. reentrancy as Keil C51, just with different naming:

    https://github.com/contiki-os/contiki/wiki/8051-Memory-Spaces#SDCC_Memory_Models_Variables_Function_Parameters_and_the_Stack

     

    Since you mention that this issue seems to occur if several things happen at the same time:

    If you do not compile with --stack-auto, all auto-variables will be placed similar to a static/global variable. Thus, if you have a function that is called from both main-context and interrupt-context, you have to make sure that this specific function is reentrant to avoid the variable being overwritten by the interrupt. I do not know if you use interrupts at all, but if you do any processing in interrupt, where the specific function calls will overlay with the main context, there will be a chance that your memory will be corrupted.

     

    You mention that 0.8% of the transfers include a ACK payload, but that all corrupted packets seems to occur within this 0.8% "window". How often would this be (statistically wise), and how many re-transmits have you configured?

    If you see a corruption in 3 packets every day, and you are sending every 15 ms (~5.7 M packets per day), it might be due to weak CRC, especially in a noisy environment where other 2.4 GHz equipment can interfere heavily.

     

    Kind regards,

    Håkon

Related