USB CDC ACM: RX data gets overwritten in app_usbd_cdc_acm_read_any() at end of data transmission

To handle RX of USB CDC ACM we are polling data using:

ret = app_usbd_cdc_acm_read_any(&m_app_cdc_acm, m_rx_buffer, MIN(BUFFER_SIZE,size));

On

case NRF_SUCCESS:

rx_size = app_usbd_cdc_acm_rx_size(&m_app_cdc_acm);
copy_rx_data(buffer, n, m_rx_buffer);

We assume data has been transferred to our use buffer `m_rx_buffer` and get the data directly.

On

case NRF_ERROR_IO_PENDING:

usb_txrx_state = USB_TXRX_STATE_RX;

We assume data has been NOT been transferred and assume to we get the data in the event handler instead.

void cdc_acm_user_ev_handler(app_usbd_class_inst_t const * p_inst,
app_usbd_cdc_acm_user_event_t event)
{
app_usbd_cdc_acm_t const * p_cdc_acm = app_usbd_cdc_acm_class_get(p_inst);
last_usb_cdc_event = event;
ret_code_t ret;
switch (event)
{
...
case APP_USBD_CDC_ACM_USER_EVT_RX_DONE:

rx_size = app_usbd_cdc_acm_rx_size(p_cdc_acm);
copy_rx_data(buffer, n, m_rx_buffer);

This works well until the end of the data (583 bytes in total) is transferred.

On the NEXT TO LAST call to app_usbd_cdc_acm_read_any() we get NRF_SUCCESS and get correct data.

On the last call to app_usbd_cdc_acm_read_any() we get NRF_ERROR_IO_PENDING, but we see that correct data anyway has still been transferred to m_rx_buffer.

But because we assume there are no data directly available (it should be fetched in event handler) we make an intermediate call to:

while (app_usbd_event_queue_process()) {
}

To finalize the transfer event handler.

Stepping through app_usbd_event_queue_process() we see the following sequence of calls ...

#if (APP_USBD_CONFIG_SOF_HANDLING_MODE == APP_USBD_SOF_HANDLING_COMPRESS_QUEUE)
if (p_event_item->sof_cnt > 0)
{
if (p_event_item->start_frame > USBD_FRAMECNTR_FRAMECNTR_Msk)
{
p_event_item->start_frame = 0;
}
sof_event.drv_evt.data.sof.framecnt = (p_event_item->start_frame)++;
--(p_event_item->sof_cnt);
app_usbd_event_execute(&sof_event);
return true;
}
#endif // (APP_USBD_CONFIG_SOF_HANDLING_MODE == APP_USBD_SOF_HANDLING_COMPRESS_QUEUE)

==> app_usbd_event_execute(&(p_event_item->evt));

...

/**
* @brief @ref app_usbd_class_methods_t::event_handler
*/
static ret_code_t cdc_acm_event_handler(app_usbd_class_inst_t const * p_inst,
app_usbd_complex_evt_t const * p_event)
{
ASSERT(p_inst != NULL);
ASSERT(p_event != NULL);

ret_code_t ret = NRF_SUCCESS;
switch (p_event->app_evt.type)
{
...
case APP_USBD_EVT_DRV_EPTRANSFER:
==> ret = cdc_acm_endpoint_ev(p_inst, p_event);
break;

...

static ret_code_t cdc_acm_endpoint_ev(app_usbd_class_inst_t const * p_inst,
app_usbd_complex_evt_t const * p_event)
{

==> ret = cdc_acm_rx_block_finished(p_inst);

...

static ret_code_t cdc_acm_rx_block_finished(app_usbd_class_inst_t const * p_inst)
{

...

==> memcpy(p_cdc_acm_ctx->rx_transfer[0].p_buf,
p_cdc_acm_ctx->internal_rx_buf,
bytes_to_cpy);

The final memcpy finally COPIES OLD DATA from p_cdc_acm_ctx into our user buffer (which contained good data).

So there seems to be some internal USB stack state machine problem here ...

To summarise:

1. Correct data is transferred from app_usbd_cdc_acm_read_any() even if we got NRF_ERROR_IO_PENDING.
2. Consective call app_usbd_event_queue_process() copies old data (i.e. overwrites good data) to user buffer. This should not be possible if USB state contained in p_cdc_acm_ctx is valid.
3. We dont get APP_USBD_CDC_ACM_USER_EVT_RX_DONE event as expected until user buffer is overwritten by app_usbd_event_queue_process()


We are using SDK 17.0.2.

We have enabled APP_USBD_CDC_ACM_CONFIG_LOG_ENABLED, but it does not give any good indication.

One addional note: This is size dependent as well. So if last call returns data pending (which seems to be size dependent) internal buffer seems to be corrupted.

615 OK
593 - 599 OK
581-592 NOK


Parents
  • Hi

    Hieu will be back in office to continue this case tomorrow, but in the meantime one of our experts on the USB stack has taken a look at your project and has some feedback for you.

    Although we couldn't pinpoint what the error is likely to be we have some suggestions to what it could be.

    First off, do you do the intermediate call to while (app_usbd_event_queue_process() from an interrupt? If so that will definitely cause trouble with the priorities in your application. You could test the application in IRQ mode, with the event queue disabled to see if that changes anything.

    There might be a problem with the way you're using the app_usbd_event_queue_process() function, asthe point is to call it in your while loop, and not wherever you think it "fits" into your application.

    You're also using multiple blocking delays within your project, and the entire nRF5 SDK is based on asynchronous APIs, with callbacks when there is data to process. We would recommend you set up your project similarly, and go to a sleep mode at the end of your while loop instead of just running a delay. Based on the number of iterations we've had of our USB stack in the nRF5 SDK we don't think this is inherently an issue within the stack.

    Best regards,

    Simon

  • Yes, app_usbd_event_queue_process() is only called in main() in test app above and not from any interrupt routine.

    I also tried to replace blocking nrf_delay_ms() calls with RTC based events, but the same behavior persists.

    Not exactly sure what you mean by this:
    "You could test the application in IRQ mode, with the event queue disabled to see if that changes anything."

    Should IRQ be disabled at some state of the program?

    Anyway I have attached my updated version here. Only modified main.c 
    usb-test-packed-2.tgz


    Best regards / Peter

  • Hi Peter,

    I would like to give a short reply. I returned and started working on the case. However, I am unfamiliar with Ruby and have been having some hiccups with the reproduction.

    PeterLjung said:
    "You could test the application in IRQ mode, with the event queue disabled to see if that changes anything."

    Should IRQ be disabled at some state of the program?

    This is done by setting APP_USBD_CONFIG_EVENT_QUEUE_ENABLE to 0. This disables the USB event queue in the app_usbd library and all USB events would be handled directly in the interrupt handler.

    I would like to discuss an alternative while the issue is not yet clear. Have you tried using app_usbd_cdc_acm_read() instead? That API allows double buffering. Since the package length is not constant, you probably have to use a buffer size of 1. However, if your package has a format, or if you can control the host side to send it in a divisible number, like 512, 528, 544, ... (increment by 16), then you can use a larger buffer size.

    One more thing I should have checked in my first reply. Are you working on a new product? What development stage are you at? If you are at early stages, then our official recommendation is the nRF Connect SDK instead of the nRF5 SDK, which is in maintenance mode.

    I will continue to investigate this issue today and tomorrow. I would like to at least get to the point of getting a reproduction setup to run.

    Best regards,

    Hieu

  • It looks like disabling event queue (APP_USBD_CONFIG_EVENT_QUEUE_ENABLE) removes manual triggering of usb state via app_usbd_event_queue_process() so it changes the usage of the API a bit.

    We tried app_usbd_cdc_acm_read(), but ended up using _any() variant instead. It think we had the same problem with this one. Is really double buffering usb stack implementation simpler?

    We are using a separate sw stack that only support SDK so right now its not possible to move to connect.

    Regarding ruby it should only be to install ruby and one single gem serialport.
    Make sure it looks something like this.  Later ruby version should also be fine.
     
    $ ruby -v
    ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux-gnu]

    $ gem list

    *** LOCAL GEMS ***

    ...
    serialport (1.3.2)
    ...

  • PeterLjung said:
    It looks like disabling event queue (APP_USBD_CONFIG_EVENT_QUEUE_ENABLE) removes manual triggering of usb state via app_usbd_event_queue_process() so it changes the usage of the API a bit.

    That's right. If I remember correctly, you probably won't need app_usbd_event_queue_process() when the event queue is disabled. This is just a temporary debugging change to help narrow down the issue though.

    PeterLjung said:
    We tried app_usbd_cdc_acm_read(), but ended up using _any() variant instead. It think we had the same problem with this one. Is really double buffering usb stack implementation simpler?

    I don't think either _read() or _read_any() API implementation in the app_usbd library is significantly simpler than the other.

    Or could you perhaps be asking about the application layer, where you implement the handling? If so, I think the implementation is different, but not necessarily simpler or more complex. I just suggested it in case it could be a working alternative for you, as well as a potential further hint on where the issue.

    However, that's a bit strange if you had the same issue with the _read() API. The way the two APIs handle buffers are different, after all.
    Could you check if there are some notes, such as in commit messages, from that time?

    PeterLjung said:
    Regarding ruby it should only be to install ruby and one single gem serialport.
    Make sure it looks something like this.  Later ruby version should also be fine.

    Ah yes. I have Ruby 3.2.2 and serialport gem version 1.3.2. The hiccups I mentioned was embarrassingly with the installation... Turns out I had some misunderstanding about the MSYS2 part of the installation log, and everything went (almost) fine...

    However, when I run the script, it is not working though.

    There were a couple of error message during installation about the wizard being unable to find catgets and libcatgets for removal to avoid conflicts.
    Seems related to this PR. I don't think this is an issue though.

    I will try and see if this is just some simple differences in COM port handling between Windows and Linux and fix it. If that doesn't work, I will figure something else out. I will keep you posted, nonetheless.

  • This reply was deleted.
Reply Children
  • Aha, so you are on a Windows machine?

    lsusb is UNIX utility so that could be a problem on Windows. 
    I am working from Ubuntu 20.04.

    lsusb and usb_connected? part is only for robustness of the script and could be removed if neccesary.

    # Open protocol towards serial client
    loop do
    if usb_connected? then
    show "BLE device connected. Connect to USB serial port."
    sp = SerialPort.new(serial_dev, 115200, 8, 1, SerialPort::NONE)
    break
    else
    show "BLE device not connected to USB port. Wait ..."
    sleep 10
    end
    end

    could be replaced by

    sp = SerialPort.new(serial_dev, 115200, 8, 1, SerialPort::NONE)

  • Thanks for the tip, Peter. I commented out the USB connected check and it seems the script can progress past that now.

    However, it seems the test is not running as expected:

    When I tried the first project you sent instead, my PC doesn't even accept the USB device..

    Do you have any idea what is wrong? I will dive into it in a bit, but asking in case something makes sense for you immediately.

    By the way, I am thinking about modifying the projects so that the device just listens, and the Ruby script immediately sends a fixed size packet. Do you think there is any problem with this?

  • Actually I have not tried exactly this sample on Windows.
    Question is if the USB device is detected by Windows or not. Can you check?

    On Linux, device may alter between e.g. two devices e.g. ttyACM2 and ttyACM3

    I think we had to introduce a delay specifically for Windows compared with Linux in another project.
    So you could try to introduce a delay between usb_config() and usb_write() and/or include a set of process_usb_events() between operations.

    Hope this helps. 

    I will also try on a Windows machine tomorrow and compare ...

    Best regards / Peter


  • I tried this on a Windows machine ...

    But unfortunately I couldn't get Nordic USB CDC ACM sample to work on Windows or my sample.


    Device was not detected by Device Manager.

    USB CDC ACM Example

    I also tried on Linux and there both variants was detected by the OS.

    I tested on a Windows 10 machine and assume that no drivers are needed.
    If you could test on a Linux machine I think it would simplify things, but I will also make a new attempt on Windows next week.

    Have you tested the Nordic USB CDC sample on your machine?
    Best regards / Peter

  • Not recently, but I have worked with the Nordic USB CDC sample on my Windows machine before. The driver should be automatically installed after a few seconds. Could it be that your Windows machine's driver installation is limited due to enterprise firewall or Windows policy?

    I haven't been successful yet. I will try WSL next, not having a Linux machine to try,

    Best regards,

    Hieu

Related