This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Missing UART bytes during transmission

I'm having an rather cryptic problem where around 0.01% bytes are simply not transmitted via UART for some reason.

This error happens independently from the chosen baudrate, parity bytes, or control lines setting.

I've verified it happens on the transmit side, by tapping into the Tx line with protocol analyzer. However, I was unable to catch the exact moment in the analog domain, it's insanely hard thing to do with such low probability of happening. I'm using nrf52dk development board.

The device's job is to listen for BLE packets, filter them, and transmit their content via UART (from the softdevice context). It simultaneously responds to some commands from the main context too. This might be the clue, but I cannot see how, as beside this problem, the transmission works reliably.

Uart initialization (the event handler is empty)

static void uart_init(void)
{
    ret_code_t err_code;

    app_uart_comm_params_t const comm_params =
    {
        .rx_pin_no    = RX_PIN_NUMBER,
        .tx_pin_no    = TX_PIN_NUMBER,
        .rts_pin_no   = RTS_PIN_NUMBER,
        .cts_pin_no   = CTS_PIN_NUMBER,
        .flow_control = APP_UART_FLOW_CONTROL_DISABLED,
        .use_parity   = false,
        .baud_rate    = UART_BAUDRATE_BAUDRATE_Baud230400
    };

    APP_UART_FIFO_INIT(&comm_params,
                       UART_RX_BUF_SIZE,
                       UART_TX_BUF_SIZE,
                       uart_event_handle,
                       APP_IRQ_PRIORITY_LOWEST,
                       err_code);

    APP_ERROR_CHECK(err_code);
}

Data transmission method

bool uart_put_bytes(lbga_raw_packet_t *p_raw_packet){

    APP_ERROR_CHECK(app_uart_put(LBGA_API_CHAR_PACKET_START));

    for (uint16_t i=0;i<p_raw_packet->len;i++){
        volatile char c = p_raw_packet->data[i];
        if ((c == LBGA_API_CHAR_PACKET_START)||
            c == (LBGA_API_CHAR_PACKET_END)||
            c == (LBGA_API_CHAR_PACKET_CR)){
                APP_ERROR_CHECK(1);
            }
        APP_ERROR_CHECK(app_uart_put(c));
    }
    APP_ERROR_CHECK(app_uart_put(LBGA_API_CHAR_PACKET_END));
    return true;
}


Example of the broken data packet (capture from protocol analyzer) - notice the missing "h4E" in 3rd packet.

h30 h31 h20 h6F h08 h00 h00 h93 hEF h12 hB8 h77 hF8 h02 h01 h1E h1B hFF h75 h00 h42 h04 h01 h40 h4E hF8 h77 hB8 h12 hEF h93 hFA h77 hB8 h94 hEF h92 h01 h00 h00 h00 h00 h00 h00 h00 h1F h00 hB5 hBF h20 
h30 h31 h20 h70 h08 h00 h00 h93 hEF h12 hB8 h77 hF8 h02 h01 h1E h1B hFF h75 h00 h42 h04 h01 h40 h4E hF8 h77 hB8 h12 hEF h93 hFA h77 hB8 h94 hEF h92 h01 h00 h00 h00 h00 h00 h00 h00 h1F h00 hB5 h59 hE3 
h30 h31 h20 h71 h08 h00 h00 h93 hEF h12 hB8 h77 hF8 h02 h01 h1E h1B hFF h75 h00 h42 h04 h01 h40 hF8 h77 hB8 h12 hEF h93 hFA h77 hB8 h94 hEF h92 h01 h00 h00 h00 h00 h00 h00 h00 h1F h00 hB5 h1E h0F

Best regards,
Jakub

Parents
  • Try replacing:

            APP_ERROR_CHECK(app_uart_put(c));

    with this:

             do
             {
                err_code = app_uart_put(c);
                if ((err_code != NRF_SUCCESS) && (err_code != NRF_ERROR_BUSY))
                {
                   // NRF_LOG_ERROR("Failed Tx Uart message. Error 0x%x. \r\n", err_code);
                   // APP_ERROR_CHECK(err_code);
                }
             } while (err_code == NRF_ERROR_BUSY);

    Also make UART_TX_BUF_SIZE big, say 2048

    I've used the above BLE-to-uart at both 230,400 and 1MBaud on large data transfers with no issues

  • I forgot to mention, the UART_TX_BUF_SIZE is >4k

    While I agree that your scheme of transmission might work better in production, that doesn't change anything in my case, as app_uart_put() always returns NRF_SUCCESS for me.

    Just for the reference, i checked it, doesn't seem to fix the issue.

  • >4k means 4096 I take it, assuming you are using the FIFO which requires a power of 2 buffer size. Since 1/4096 = .02% a single byte error on FIFO buffer wrap would be roughly the same order as the observed error rate (about 0.01%) which raises suspicions of a bug in the library code. Perhaps you could try changing the buffer size to see if it affects the error rate?

    Looking at the fifo code I see this dubious code:

    /**@brief Put one byte to the FIFO. */
    static __INLINE void fifo_put(app_fifo_t * p_fifo, uint8_t byte)
    {
        p_fifo->p_buf[p_fifo->write_pos & p_fifo->buf_size_mask] = byte;
        p_fifo->write_pos++;
    }

    Notice how p_fifo->write_pos is not checked for overflow; does it matter? Maybe not if always used with the mask, but I would describe that as unsafe code and now p_fifo->write_pos will wrap at 2^32 instead of TX_BUFFER_SIZE. p_fifo->write_pos is 32-bit, p_fifo->buf_size_mask is 16-bit. Does it work? Probably, but dodgy code methinks

    Look at this:

    static __INLINE uint32_t fifo_length(app_fifo_t * p_fifo)
    {
        uint32_t tmp = p_fifo->read_pos;
        return p_fifo->write_pos - tmp;
    }

    No sign of the mask being used here; does it matter? Hmm .. differencing two indices which are both beyond the end of the buffer they refer to is not good practice; when write wraps at 2^32-1 before read, what then? Maybe (with mask) 1 character is lost every 4096

    /**@brief Look at one byte in the FIFO. */
    static __INLINE void fifo_peek(app_fifo_t * p_fifo, uint16_t index, uint8_t * p_byte)
    {
        *p_byte = p_fifo->p_buf[(p_fifo->read_pos + index) & p_fifo->buf_size_mask];
    }
    
    
    /**@brief Get one byte from the FIFO. */
    static __INLINE void fifo_get(app_fifo_t * p_fifo, uint8_t * p_byte)
    {
        fifo_peek(p_fifo, 0, p_byte);
        p_fifo->read_pos++;
    }

    peek uses mask, get does not, so no correct wrap on read.

    So, maybe try reducing the buffer size which should increase the error rate in proportion. Or I'm wrong :-)

    My notes above were for SDK v15.3.0, but I see the FIFO handling was changed in SDK v17.0.2 so maybe just try that; it wasn't clear from the post which version was used originally.

  • Oh, one other issue which is more relevant. app_uart_put() is in a different thread and priority level (BLE SoftDevice) to app_uart_get() (user code serial interrupt handler). Now I recall you mentioned that app_uart_put() is also invoked from the user code uart handler. This introduces a race hazard since there is no locking and the code is not safely re-entrant, which may be more plausible than the end-of-buffer issue. I had the exact same issue now I recall; the answer is to only put and get from within the main() or other single context; incoming BLE packets are better buffered in an intermediate buffer before collecting and processing that buffer from the main context. I think this is your solution, and of course confirms your original suspicion.

    You can prove this:

    in main.c:
    volatile bool SignalAlreadyInInterrupt = false;
    volatile uint32_t FaultCount = 0UL;
    
    in ble_nus_c_evt_handler():
            case BLE_NUS_C_EVT_NUS_TX_EVT:
                // Check if interrupt was interrupted
                if (SignalAlreadyInInterrupt) FaultCount++;
                ble_nus_chars_received_uart_print(p_ble_nus_evt->p_data, p_ble_nus_evt->data_len);
                break;
    
    in uart_event_handle()
    {
            /**@snippet [Handling data from UART] */
            case APP_UART_DATA_READY:
                if (sending anything to uart) SignalAlreadyInInterrupt = true;
                Blah Blah
                SignalAlreadyInInterrupt = false;
                
    in other uart transmits
            similat to above

    I'm spending time on this since it potentially affects medical devices I have designed.

  • Hey, thanks much for your time.

    I tried to find an correlation with amout of data sent. There are a little more characters lost when interface is more busy. But there seem to be to correlation with TX_FIFO size.

    The "counter" counts chars sucesfully sent between errors. Example log dump:

    17:11:22 ERROR lbga_serial.c:335: counter: 73724
    17:11:26 ERROR lbga_serial.c:335: counter: 2419
    17:12:33 ERROR lbga_serial.c:335: counter: 48811
    17:13:43 ERROR lbga_serial.c:335: counter: 47623
    17:14:32 ERROR lbga_serial.c:335: counter: 35296


    Regarding your idea about fault counting, I'm not using the uart_event_handle() at all.

    After having another look at app_uart_fifo.c I think I'll just reimplement the TX fifo my way, or maybe implement message-based communication, so there's no context mishmash, and everything is sent from main() context.

    I'll check-in with the results here.

  • So I've done as I said, now the data to be broadcasted via UART is passed to the main() context via nrf_queue. This has solved the problem of missing bytes.

    I still don't know the cause of the missing bytes, I guess the lesson for today is to never cut corners in such multi-context applications.

    Thanks for your cooperation.

Reply Children
No Data
Related