USB CDC ACM: RX data gets overwritten in app_usbd_cdc_acm_read_any() at end of data transmission

To handle RX of USB CDC ACM we are polling data using:

ret = app_usbd_cdc_acm_read_any(&m_app_cdc_acm, m_rx_buffer, MIN(BUFFER_SIZE,size));

On

case NRF_SUCCESS:

rx_size = app_usbd_cdc_acm_rx_size(&m_app_cdc_acm);
copy_rx_data(buffer, n, m_rx_buffer);

We assume data has been transferred to our use buffer `m_rx_buffer` and get the data directly.

On

case NRF_ERROR_IO_PENDING:

usb_txrx_state = USB_TXRX_STATE_RX;

We assume data has been NOT been transferred and assume to we get the data in the event handler instead.

void cdc_acm_user_ev_handler(app_usbd_class_inst_t const * p_inst,
app_usbd_cdc_acm_user_event_t event)
{
app_usbd_cdc_acm_t const * p_cdc_acm = app_usbd_cdc_acm_class_get(p_inst);
last_usb_cdc_event = event;
ret_code_t ret;
switch (event)
{
...
case APP_USBD_CDC_ACM_USER_EVT_RX_DONE:

rx_size = app_usbd_cdc_acm_rx_size(p_cdc_acm);
copy_rx_data(buffer, n, m_rx_buffer);

This works well until the end of the data (583 bytes in total) is transferred.

On the NEXT TO LAST call to app_usbd_cdc_acm_read_any() we get NRF_SUCCESS and get correct data.

On the last call to app_usbd_cdc_acm_read_any() we get NRF_ERROR_IO_PENDING, but we see that correct data anyway has still been transferred to m_rx_buffer.

But because we assume there are no data directly available (it should be fetched in event handler) we make an intermediate call to:

while (app_usbd_event_queue_process()) {
}

To finalize the transfer event handler.

Stepping through app_usbd_event_queue_process() we see the following sequence of calls ...

#if (APP_USBD_CONFIG_SOF_HANDLING_MODE == APP_USBD_SOF_HANDLING_COMPRESS_QUEUE)
if (p_event_item->sof_cnt > 0)
{
if (p_event_item->start_frame > USBD_FRAMECNTR_FRAMECNTR_Msk)
{
p_event_item->start_frame = 0;
}
sof_event.drv_evt.data.sof.framecnt = (p_event_item->start_frame)++;
--(p_event_item->sof_cnt);
app_usbd_event_execute(&sof_event);
return true;
}
#endif // (APP_USBD_CONFIG_SOF_HANDLING_MODE == APP_USBD_SOF_HANDLING_COMPRESS_QUEUE)

==> app_usbd_event_execute(&(p_event_item->evt));

...

/**
* @brief @ref app_usbd_class_methods_t::event_handler
*/
static ret_code_t cdc_acm_event_handler(app_usbd_class_inst_t const * p_inst,
app_usbd_complex_evt_t const * p_event)
{
ASSERT(p_inst != NULL);
ASSERT(p_event != NULL);

ret_code_t ret = NRF_SUCCESS;
switch (p_event->app_evt.type)
{
...
case APP_USBD_EVT_DRV_EPTRANSFER:
==> ret = cdc_acm_endpoint_ev(p_inst, p_event);
break;

...

static ret_code_t cdc_acm_endpoint_ev(app_usbd_class_inst_t const * p_inst,
app_usbd_complex_evt_t const * p_event)
{

==> ret = cdc_acm_rx_block_finished(p_inst);

...

static ret_code_t cdc_acm_rx_block_finished(app_usbd_class_inst_t const * p_inst)
{

...

==> memcpy(p_cdc_acm_ctx->rx_transfer[0].p_buf,
p_cdc_acm_ctx->internal_rx_buf,
bytes_to_cpy);

The final memcpy finally COPIES OLD DATA from p_cdc_acm_ctx into our user buffer (which contained good data).

So there seems to be some internal USB stack state machine problem here ...

To summarise:

1. Correct data is transferred from app_usbd_cdc_acm_read_any() even if we got NRF_ERROR_IO_PENDING.
2. Consective call app_usbd_event_queue_process() copies old data (i.e. overwrites good data) to user buffer. This should not be possible if USB state contained in p_cdc_acm_ctx is valid.
3. We dont get APP_USBD_CDC_ACM_USER_EVT_RX_DONE event as expected until user buffer is overwritten by app_usbd_event_queue_process()


We are using SDK 17.0.2.

We have enabled APP_USBD_CDC_ACM_CONFIG_LOG_ENABLED, but it does not give any good indication.

One addional note: This is size dependent as well. So if last call returns data pending (which seems to be size dependent) internal buffer seems to be corrupted.

615 OK
593 - 599 OK
581-592 NOK


Parents
  • Hi

    Hieu will be back in office to continue this case tomorrow, but in the meantime one of our experts on the USB stack has taken a look at your project and has some feedback for you.

    Although we couldn't pinpoint what the error is likely to be we have some suggestions to what it could be.

    First off, do you do the intermediate call to while (app_usbd_event_queue_process() from an interrupt? If so that will definitely cause trouble with the priorities in your application. You could test the application in IRQ mode, with the event queue disabled to see if that changes anything.

    There might be a problem with the way you're using the app_usbd_event_queue_process() function, asthe point is to call it in your while loop, and not wherever you think it "fits" into your application.

    You're also using multiple blocking delays within your project, and the entire nRF5 SDK is based on asynchronous APIs, with callbacks when there is data to process. We would recommend you set up your project similarly, and go to a sleep mode at the end of your while loop instead of just running a delay. Based on the number of iterations we've had of our USB stack in the nRF5 SDK we don't think this is inherently an issue within the stack.

    Best regards,

    Simon

  • Yes, app_usbd_event_queue_process() is only called in main() in test app above and not from any interrupt routine.

    I also tried to replace blocking nrf_delay_ms() calls with RTC based events, but the same behavior persists.

    Not exactly sure what you mean by this:
    "You could test the application in IRQ mode, with the event queue disabled to see if that changes anything."

    Should IRQ be disabled at some state of the program?

    Anyway I have attached my updated version here. Only modified main.c 
    usb-test-packed-2.tgz


    Best regards / Peter

  • I tried this on a Windows machine ...

    But unfortunately I couldn't get Nordic USB CDC ACM sample to work on Windows or my sample.


    Device was not detected by Device Manager.

    USB CDC ACM Example

    I also tried on Linux and there both variants was detected by the OS.

    I tested on a Windows 10 machine and assume that no drivers are needed.
    If you could test on a Linux machine I think it would simplify things, but I will also make a new attempt on Windows next week.

    Have you tested the Nordic USB CDC sample on your machine?
    Best regards / Peter

  • Not recently, but I have worked with the Nordic USB CDC sample on my Windows machine before. The driver should be automatically installed after a few seconds. Could it be that your Windows machine's driver installation is limited due to enterprise firewall or Windows policy?

    I haven't been successful yet. I will try WSL next, not having a Linux machine to try,

    Best regards,

    Hieu

  • Hi Peter,

    I would like to give you an update.

    I tried Windows again with your suggestion here:

    PeterLjung said:
    I think we had to introduce a delay specifically for Windows compared with Linux in another project.
    So you could try to introduce a delay between usb_config() and usb_write() and/or include a set of process_usb_events() between operations.

    But the result has remained the same.

    I also tried to create another setup with WSL. However, I have been having issue passing the USB port to WSL. It seems there are some firewall issues.

    I will try to pull in some other help, preferably someone who is working with Linux.

    Best regards,

    Hieu

  • I tried Nordic USB CDC ACM sample on another Windows machine and there it is detected fine.
    But on the computer that I have access to it doesn't for some reason. The device is not detected at all. 

    1. I looked at BIOS settings to see if devices was disabled
    2. I tested another device (USB stick) on the same port to see if that worked. And yes.
    3. I tried to install nRF utils tools to see if there still was some driver issue.
    4. I tried to power the board through debug USB port to see if power was the issue.

    But it works fine on 3 other machines with different OSes (macOS, OpenBSD and Linux).
    Windows is just too strange. If you could try on Linux it would be great. 

    Best regards / Peter

  • There is one person in my team with a Linux machine, but they are away until next Monday. I will try to get their help then. In the meantime, I will try with a virtual machine.

    Best regards,

    Hieu

Reply Children
  • I did a test on another Windows machine and there USB is detected fine. The one I am using does not work. But I also noticed that serial ports COM ports are rather written as a number rather than a device name as in Linux. 
    Also I cannot detect if device is connected on Windows which was discussed above. So the RW protocol does not work on Windows.
    I made a quick rewrite that possibly could work on Windows. Device is not detected and script is will not be waiting for a command from the nRF device. I will not be able to test this until later this week (I dont have access to this machine) so it may very well have issues.
    usb-test-packed-3-windows.tgz

    So it is certainly much easier to try on regular Linux machine.

  • Hi Peter,

    I see that you updated your reply today with further information on how to run it on Windows. Thank you so much for the help.

    It seems it has an interesting bug where the real port it runs is always 1 number higher than the port given in argument. I.e., below I connect to COM30 by running ruby .\usb-test.rb 29

    Nonetheless, I tried that, and it seems I can produce some issue... The receiving logs showed a lot of data was missing. That probably was not the one you are describing?

    I am not very successful with the Linux virtual machine attempt either. It seems the Ruby script cannot communicate with the device for some reason.

    My Linux colleague can't go to office, unlike I had hoped.

    If push comes to shove, I think I can cook up something on Python to replicate what your Ruby script should do.

    This issue happens when the PC sends a specific length message to the nRF device, right? 

    The way I usually attempt to reproduce this kind of issues, I usually setup the TX device to send at a button trigger. In this case, I think I will just setup something simple like when I hit Space, the PC should send "ABCDEFGH00ABCDEFGH01ABCDEFGH02...ABCDEFGH56."

    On the device end, I can try with either your code or a minimally modified CDC ACM example to use the _read_any() API, then slowly add/remove debug logs, depends on the details.

    Do you think a setup like above would work?

    I will also consider asking if someone from another team can give me a hand with the Linux trial.

    Best regards,

    Hieu

  • Yes, you are correct. Serial port on windows start from 0 which correspond to COM1.

    So to fix that this row could be changed.

    sp = SerialPort.new(serial_dev.to_i - 1, 115200, 8, 1, SerialPort::NONE)

    I looks like at least 2 rows are transmitted in your current windows run. Doesn't go further than that?

    The issue I see on Linux is just at the end of the transmission. And the issue will be seen as wrong (old) data is received from USB internal buffer. If all rows are transmitted correctly the loop should just continue with another transmission from script after 5 seconds.

    If you want to keep the row handling of the firmware i.e. splitting rows delimited by ; maybe you only replace current content examples by to possibly simplify debugging.

    def working_sample
    return "ABC00DEFGHIJKLMNOP;ABC01DEFGHIJKLMNOPQR;ABC02DEFGHIJKLMNOPQR;ABC03DEFGHIJKLMNOPQR;ABC04DEFGHIJKLMNOPQR;ABC05DEFGHIJKLMNOPQR;ABC06DEFGHIJKLMNOP;ABC07DEFGHIJKLMNOPQR;ABC08DEFGHIJKLMNOPQR;ABC09DEFGHIJKLMNOPQR;ABC10DEFGHIJKLMNOPQR;ABC11DEFGHIJKLMNOP;ABC12DEFGHIJKLMNOPQR;ABC13DEFGHIJKLMNOPQR;ABC14DEFGHIJKLMNOPQR;ABC15DEFGHIJKLMNOP;ABC16DEFGHIJKLMNOPQR;ABC17DEFGHIJKLMNOPQR;ABC18DEFGHIJKLMNOPQR;ABC19DEFGHIJKLMNOP;ABC20DEFGHIJKLMNOP;ABC21DEFGHIJKLMNOP;ABC22DEFGHIJKLMNOP;ABC23DEFGHIJKLMNOP;ABC24DEFGHIJKL;ABC26DEFGHIJKL;ABC27DEFGHIJKL;ABC28DEFGHIJKL;ABC29DEFGHIJKL;ABC30DEFGHIJKLM;ABC31DEFGHIJKLM;EOT;"
    end

    def not_working_sample
    return "ABC00DEFGHIJKLMNOP;ABC01DEFGHIJKLMNOPQR;ABC02DEFGHIJKLMNOPQR;ABC03DEFGHIJKLMNOPQR;ABC04DEFGHIJKLMNOPQR;ABC05DEFGHIJKLMNOPQR;ABC06DEFGHIJKLMNOP;ABC07DEFGHIJKLMNOPQR;ABC08DEFGHIJKLMNOPQR;ABC09DEFGHIJKLMNOPQR;ABC10DEFGHIJKLMNOPQR;ABC11DEFGHIJKLMNOP;ABC12DEFGHIJKLMNOPQR;ABC13DEFGHIJKLMNOPQR;ABC14DEFGHIJKLMNOPQR;ABC15DEFGHIJKLMNOP;ABC16DEFGHIJKLMNOPQR;ABC17DEFGHIJKLMNOPQR;ABC18DEFGHIJKLMNOPQR;ABC19DEFGHIJKLMNOP;ABC20DEFGHIJKLMNOP;ABC21DEFGHIJKLMNOP;ABC22DEFGHIJKLMNOP;ABC23DEFGHIJKLMNOP;ABC24DEFGHIJKL;ABC26DEFGHIJKL;ABC27DEFGHIJKL;ABC28DEFGHIJKL;ABC29DEFGHIJKL;ABC30DEFGHIJKLM;EOT;"
    end

    I will also try to reproduce the same issue on Windows later this week.

  • Another engineer took a quick look and pointed out that you are having NRF_LOG_FLUSH() in the USB event handler, which would cause delay in interrupt execution, potentially caused timing issues.

    Could you please disable logging entirely and run a test?

  • Ok, yes I tried both to comment out all logging in usb.c i.e. in all eventhandlers and also tried to disable in sdk_config (NRF_LOG_ENABLED = 0), but it seems to behave as before.

    When I run this on Ubuntu 20.04 the "working_sample" continues i.e. transmission completes with multiple transmissions. While "not_working_sample" does not. 

    This is the code I used in this test.
    usb-test-packed-4.tgz

    I also tried to detect USB device on a newly installed WIndows 10 Home machine, but that behaves like the previous one. I got a short flash something like "Detected new USB device USB CDC PALOS", but it is not detected in Device Manager. On a collegues Windows it works though. I don't know why this differ.

Related