This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Bug in serialization ser_phy_hci_slip_cdc.c

I'm looking at the patched version resulting from applying the pc-ble-driver patch, but this might be present in the original as well.

675     case APP_USBD_CDC_ACM_USER_EVT_RX_DONE:
676         {
677             ret_code_t ret_code;
678             do
679             {
680                 ser_phi_hci_rx_byte(m_rx_byte);
681 
682                 ret_code = app_usbd_cdc_acm_read(p_cdc_acm, &m_rx_byte, 1);
683             } while (ret_code == NRF_SUCCESS);
684         }
685         break;
686 
687     default:
688         break;
689     }
690 }

For the 2nd APP_USBD_CDC_ACM_USER_EVT_RX_DONE event this seems to be repeating the last byte from the last event into the next packet. Note that the earlier APP_USBD_CDC_ACM_USER_EVT_PORT_OPEN event handler handles this in a more logical pattern.

    
    
            do {
                ret_code = app_usbd_cdc_acm_read(p_cdc_acm, &m_rx_byte, 1);
                if (ret_code == NRF_SUCCESS)
                {
                    ser_phi_hci_rx_byte(m_rx_byte);
                }
                else if (ret_code != NRF_ERROR_IO_PENDING)
                {
                    APP_ERROR_CHECK(ret_code);
                }
            } while (ret_code == NRF_SUCCESS);
            

I would think something like this is what is intended:

    while (NRF_SUCCESS == app_usbd_cdc_acm_read(p_cdc_acm, &m_rx_byte, 1))
    {
        ser_phi_hci_rx_byte(m_rx_byte);
    }

or just copy the implementation from the APP_USBD_CDC_ACM_USER_EVT_PORT_OPEN event.

The reason that this still works is because copying the END slip byte just resets the packet state to the beginning but this only works AS LONG AS ONE READ RETURNS A COMPLETE PACKET.

Here is what the data stream to the slip decoder looks like (one line per RX_DONE event).

i> ser_conn_dec: Command process start
i> ser_conn_dec: Tx buffer allocated
C0 18 00 00 E8 C0
C0 D8 4E 01 D9 00 62 01 FB 34 9B 5F 80 00 00 80 00 10 00 00 00 CB 00 00 01 D2 0A C0
i> ser_conn_dec: Command process start
i> ser_conn_dec: Tx buffer allocated
C0 20 00 00 E0 C0
C0 E1 4E 01 D0 00 62 01 FB 34 9B 5F 80 00 00 80 00 10 00 00 01 CB 00 00 01 CC E5 C0
i> ser_conn_dec: Command process start
i> ser_conn_dec: Tx buffer allocated
C0 28 00 00 D8 C0
C0 EA 4E 01 C7 00 62 01 24 D1 BC EA 5F 78 23 15 DE EF 12 12 23 15 00 00 01 F7 66 C0
i> ser_conn_dec: Command process start
i> ser_conn_dec: Tx buffer allocated
C0 30 00 00 D0 C0
C0 F3 4E 01 BE 00 62 01 24 D1 BC EA 5F 78 23 15 DE EF 12 12 25 15 00 00 01 9B F6 C0
i> ser_conn_dec: Command process start
i> ser_conn_dec: Tx buffer allocated
C0 38 00 00 C8 C0
C0 FC 4E 01 B5 00 62 01 24 D1 BC EA 5F 78 23 15 DE EF 12 12 24 15 00 00 01 B7 D0 C0

I've highlighted the repeated byte (in the case END).  It looks like should a single slip packet ever span one USB read this will break horribly.

Please advise.

FOLLOW UP:

I applied the fix I proposed and ser_phi_hci_rx_byte() dies horribly.  It should be evident that the 1st packet is lost because there is no END until the 2nd packet, but I fixed this by injecting one in the OPEN event handler.  Still dies horribly.  Ser_phi_hci_rx_byte needs the two ENDs in a row, the 1st one closes the packet and sends it, the second resets rx_sync.  So Ser_phi_hci_rx_byte is relying on this duplication.  Note that if data becomes available at the OPEN event, which doesn't have this bug, it won't be readable because of the above.

I'm going to need to fix up ser_phi_hci_rx_byte() as well to use my fix, relying on all the data in packet to come in on one USB read is just being way too chummy with the host USB implementation in assuming that it won't break a write up across separate transfers.

NB:  It is considered "good practice" in a slip implementation to precede each packet by an END as well.  If you get END END that is ignored, it is not a zero-length packet.  Superficially it looks like the tx_buf_fill() (which sends to the PC) does do this, but I didn't poke at it to verify.

FURTHER EXPLANATION:

What the original code is relying on is that a write(serial_port, the_complete_packet, sizeof(the_complete_packet)) on the PC side results in exactly one read on the device side of exactly one packet, ie, read(serial_port, input_buffer) always returns either nothing or exactly sizeof(the_complete_packet) but never the first half of a packet followed by the second half.  This sort of guarantee doesn't exist on the host side, it would be entirely reasonable for the host to split up the write into multiple bulk packets (especially considering single packets can be almost 800 bytes).

If the guarantee you are relying on does in fact exist THEN YOU DON'T NEED A SLIP LAYER.  You just write the complete raw packets to the virtual serial port on the PC side and they will magically come out as packets on the device side.

I have a complete set of patches now that fixes this, the changes required to fix ser_phi_hci_rx_byte()  were trivial.

  • Hi Aaron,


    I see that you solved this yourself. I will forward your findings to the developers, and report back to you if they have any comments on this.


    Best regards,
    Sigurd

  • Hi Sigurd,

    It is unclear to us if we fixed it unnecessarily or if in fact the deficiency we observed is present.

    Please see "Further explanation" for a concise description of what we think the bug is.

    Thanks,

    Aaron

      

  • Hi Aaron,

    This is the reply from the developers:

    Frankly speaking, I don't understand what is the issue here. SLIP protocol (specified in Bluetooth spec) implies that END byte (0xC0) must be at the start and end. So frames looks ok. It's not accidentally repeated byte.

    We have one fix in that area coming in 15.3.0, for the bug reported here, but it's different than what is mentioned here. The fix looks like this:

    diff --git a/components/serialization/common/transport/ser_phy/ser_phy_hci_slip_cdc.c b/components/serialization/common/transport/ser_phy/ser_phy_hci_slip_cdc.c
    index ddbf926..8ad48aa 100644
    --- a/components/serialization/common/transport/ser_phy/ser_phy_hci_slip_cdc.c
    +++ b/components/serialization/common/transport/ser_phy/ser_phy_hci_slip_cdc.c
    @@ -204,6 +204,7 @@ static void tx_buf_fill(void)
             {
                 can_continue = tx_buf_put(tx_escaped_data);
                 tx_escaped_data = 0;
    +            ++m_tx_index;
             }
             else switch (m_tx_phase)
             {
    @@ -248,21 +249,26 @@ static void tx_buf_fill(void)
     
             default:
                 ASSERT(mp_tx_data->p_buffer != NULL);
    -            uint8_t data = mp_tx_data->p_buffer[m_tx_index];
    -            ++m_tx_index;
    -
    -            if (data == APP_SLIP_END)
    -            {
    -                data = APP_SLIP_ESC;
    -                tx_escaped_data = APP_SLIP_ESC_END;
    -            }
    -            else if (data == APP_SLIP_ESC)
    +            if (m_tx_index < mp_tx_data->num_of_bytes)
                 {
    -                tx_escaped_data = APP_SLIP_ESC_ESC;
    -            }
    -            can_continue = tx_buf_put(data);
    +                uint8_t data = mp_tx_data->p_buffer[m_tx_index];
     
    -            if (m_tx_index >= mp_tx_data->num_of_bytes)
    +                if (data == APP_SLIP_END)
    +                {
    +                    data = APP_SLIP_ESC;
    +                    tx_escaped_data = APP_SLIP_ESC_END;
    +                }
    +                else if (data == APP_SLIP_ESC)
    +                {
    +                    tx_escaped_data = APP_SLIP_ESC_ESC;
    +                }
    +                else
    +                {
    +                    ++m_tx_index;
    +                }
    +                can_continue = tx_buf_put(data);
    +            }
    +            else
                 {
                     mp_tx_data->p_buffer = NULL;
     
    

    Best regards,

    Sigurd

  • For the developers:

    The trace I supplied is not the data on the wire, it is the data as given to ser_phi_hci_rx_byte().

    You are not supplying an END on the wire at the start and end.

    You are creating the END at the start by erroneously copying it here from the last byte of the previous packet here:

    675     case APP_USBD_CDC_ACM_USER_EVT_RX_DONE:
    676         {
    677             ret_code_t ret_code;
    678             do
    679             {
    680                 ser_phi_hci_rx_byte(m_rx_byte);
    681 
    682                 ret_code = app_usbd_cdc_acm_read(p_cdc_acm, &m_rx_byte, 1);
    683             } while (ret_code == NRF_SUCCESS);
    684         }
    685         break;
    686 
    687     default:
    688         break;
    689     }
    690 }
    

    In other words, when you have an RX_DONE event, start by calling ser_phi_hci_rx_byte() with an argument of m_rx_byte which is still the last byte of the previous packet, then read another byte from the cdc_acm intro m_rx_byte, if that was successful loop to the beginning.  Now m_rx_byte is the byte you read from cdc_acm, so send it via ser_phi_hci_rx_byte(), read another, etc.

    It should be obvious that the first packet ever received is corrupt and lost because m_rx_byte isn't initialized yet.  The more subtle effect, that one that is probably confusing you, is that the current code only works as long it is guaranteed that exactly one slip packet arrives per RX_DONE event. This guarantee does not exist.

    I realize there might be a language barrier but perhaps if this is still unclear we could have a conference call.  

    Thanks,

    Aaron

  • If this ticket doesn't contain any Nordic proprietary information, do you mind if I change it to public?  This was opened 30 days ago and we don't seem to be making any forward progress, the information may be useful to others.  

Related