This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

USB CDC ACM transmission is broken on voltage drops

Hello,

here is our setup:

SoftDevice 7.2.0, nrf52840

The problem:

Once the voltage on VBUS drops below 4.4V and there is active CDC ACM TX transmission (from device to host), the USBD stack becomes broken and unable to recover once the voltage recovers to a normal level. It means that it's no longer possible to send data over CDC ACM until you either reset the device or reset (stop&start) USB stack (this method not always working, so the only reliable method to recover is to reboot the device). Actually the same happens to HID (keyboard), but on the level of 4.0V and below. USB "removed" event is not fired in both cases.

Expected result:

The device (USB endpoint) has to be able to recover from the failed state, by clearing it's state and responding to the RESET_PIPE_AND_STALL packet from the host. While on the one hand, the documentation clearly states that the VBUS must be within 4.35-5.5V, on the other hand I believe the device has to have an ability to recover from this situation by resetting its internal state. As "the host" actually tries to reset the endpoint with RESET_PIPE_AND_STALL command, but the Nordic device while it is responding to the command with success, still unable to recover and fails in the loop. I'm attaching 2 *.pcapng files (it's possible to open them via Wireshark), which contains captured data between the device and the host - the one with normal functioning (when USB VBUS > 4.3V) and one with a problem.

P.S. this situation is not actually rare in the real world: for instance we installed our devices at POS (point of sale), which have outdated hardware with current output limit 0.5A  + someone could touch the USB connector or the cable + the connector could be oxidized and so on and so forth. The problem might not reproduce for several weeks because of multiple factors, but it happens frequently. And our customer expects the hardware (Nordic based device) must not be unplugged\plugged in the system as it's not the case for their existing hardware, which is also working with USB (barcode scanners, payment terminals, etc). Of course the first thing we did - is to lower our power consumption, but as I mentioned before, there is still too much factors, which might reproduce the problem.

Steps to reproduce:

Software: Windows 10 PC, any serial port reader app (e.g. Putty), Wireshark (for USB packet sniffing) and CDC ACM example project. Then the voltage on VBUS has to be adjusted to ~4.3V or below and send something over the CDC ACM to the host. The second USB packet from the device to the host with data will not be delivered and the USB data transmission between host and the device will fail in the loop. 

P.S. the simplest way to lower the voltage: 

a) use very long USB cable, e.g. 4-6m or more

b) use external power supply, which could manually adjust voltage to the required level

OK.pcapngStall.pcapng

  • Hi Oleh,

     

    My apologies for the long wait, and thank you very much for your patience on this matter.

     

    Oleh Hordiichuk said:
    Then we lowered the voltage to ~4.1V and simulated impulse power consumption, by adding 10 Ohm resistor with a duty cycle of 50%, 500hz frequency (2ms the resistor is on and 2ms is off), and overall duration 0.5 sec. After 0.5 sec the resistor is turned off. Combining it with our 2m cable length (~1Ohm) "the impulse" lowers the voltage down to 3.6V. The device immediately becomes stuck with NRF_ERROR_BUSY in app_usbd_cdc_acm_write, but USB removed event wasn't fired (!).

    This could unfortunately be a side-effect of the electrical specifications of the usb power detection:

    https://infocenter.nordicsemi.com/topic/ps_nrf52840/power.html?cp=4_0_0_4_2_7_6#unique_716562038

     

    Since you are in a voltage area, DC offset 3.6V, with a 500 mVpp signal on top, it is not guaranteed that you hit the threshold of 3.9V (capacitors aren't able to fully charge). I am not able to verify this, as this will be highly dependent on your hardware, and how it behaves on your VBUS.

     

    From a fw perspective:

    error 8 (NRF_ERROR_INVALID_STATE) is thrown at my end when you are not enumerated.

    error 17 (NRF_ERROR_BUSY) is thrown at my end when the device is enumerated, but the CDC instance isn't opened.

     

    From your logs, the problem is error 17, meaning the scenario where the CDC_ACM module isn't opened seen from the nRF device, but the firmware still tries to send data.

    Just prior to this, you're getting these logs:

    00> <info> app_usbd_core: SETUP: t: 0x02 r: 0x01
    00> <info> app_usbd_core: APP_USBD_SETUP_REQREC_ENDPOINT
    00> <info> app_usbd_core: SETUP: t: 0x02 r: 0x01
    00> <info> app_usbd_core: APP_USBD_SETUP_REQREC_ENDPOINT

     This seems to be an unhandled event on others than EP0 (ie: forwarded to the class itself), but if this is corresponds to the "urb_function_sync_reset_pipe_and_clear_stall", it seemed to be accepted (ie: that its unhandled) on my end when I test.

     

    Oleh Hordiichuk said:
    Also, there is one more important thing - we were unable to reproduce it with DevKit. Which is again might be the reason why you didn't reproduce the problem.

    This complicates the scenario, and explains why I haven't been able to reproduce the behavior, as I'm testing on the DK.

     

    Oleh Hordiichuk said:

    So we believe that impulse power consumption is the root cause of such issues and that's why it didn't reproduce in your case.

    Also, there is one more important thing - we were unable to reproduce it with DevKit. Which is again might be the reason why you didn't reproduce the problem. The devkit stabilizes the power to 5V on VBUS, so it is not so simple to simulate voltage drop with an external power supply + resistor. We connected to the power pins directly to avoid stabilizing the power, but still, it didn't reproduce yet. But we believe it's because we have to provide more load than in our experiment and the problem didn't reproduce as we, for instance, used a shorter cable (~1m). 

    The schematics used in our device is a reference design took from here (it uses linear stabilization):

    https://infocenter.nordicsemi.com/topic/ps_nrf52840/chapters/ref_circuitry.nrf52840/doc/image/nrf52840_qiaa_var1_schematic.svg

    But the devkit has impulse stabilization, which might also impact the test. 

    The DK has 4.7 uF on the VBUS:

    https://www.nordicsemi.com/-/media/Software-and-other-downloads/Dev-Kits/nRF52840-DK/nRF52840-Development-Kit---Hardware-files-2_0_1.zip

    If you are supplying the nRF with the +5V from the USB in addition, you are not only changing the "VBUS" going into the nRF, but the VDD_NRF as well. Are you also using REG0 output for supplying external devices?

     

    Kind regards,

    Håkon

  • Hello Håkon,

    thank you so much for the reply. Please see my questions:

    This could unfortunately be a side-effect of the electrical specifications of the usb power detection:

    So what would be the recommendations to avoid this problem? As I mentioned in previous messages in the real world it might reproduce due to a bad connector connection to the host. Or if the cable(connector) was accidentally touched. And other situations, which are beyond our control. However, we see that 3rd party USB hardware is not affected by this problem in the same environment.

    Since you are in a voltage area, DC offset 3.6V, with a 500 mVpp signal on top, it is not guaranteed that you hit the threshold of 3.9V (capacitors aren't able to fully charge). I am not able to verify this, as this will be highly dependent on your hardware, and how it behaves on your VBUS.

    Sorry, could you please provide more details on this. We didn't get the point. Which capacitors do you mean? Also perhaps we misunderstand your point regarding the 3.9V threshold - it seems to be the max. value for USB removed. We are hitting this value (even lower, 3.6V), but it has a short impulse nature. Do you mean that if it didn't hit lower than 3.0V than we are not guaranteed to receive the USB removed event?

    We have an assumption that as the device doesn't reach the minimum value of 3.0V, when the the voltage recovers to a normal level (5V) it assumes it remains in the same preliminary stage (between 3.0 and 3.9 V). This is just an assumption.

    Since you are in a voltage area, DC offset 3.6V, with a 500 mVpp signal on top, it is not guaranteed that you hit the threshold of 3.9V (capacitors aren't able to fully charge). I am not able to verify this, as this will be highly dependent on your hardware, and how it behaves on your VBUS.

     

    From a fw perspective:

    error 8 (NRF_ERROR_INVALID_STATE) is thrown at my end when you are not enumerated.

    error 17 (NRF_ERROR_BUSY) is thrown at my end when the device is enumerated, but the CDC instance isn't opened.

     

    From your logs, the problem is error 17, meaning the scenario where the CDC_ACM module isn't opened seen from the nRF device, but the firmware still tries to send data.

    From the fw perspective. NRF_ERROR_BUSY is generated not only when the serial port is not opened. It's generated right after we get into the problem situation, which means that from both host and device perspective - the CDC ACM port is opened and "close" event wasn't generated. Also if the host closes and opens the port one more time, this error is still generated. The error could be reset only in 2 cases - if we manually stop\disable\enable\start USBD stack or USB removed event was fired (which in fact is also resetting the USBD stack).

    So we believe it's mainly software problem, while the root cause comes from the hardware. Some internal state in CDC ACM (and actually in HID keyboard device as well, but that's a separate topic for the investigation) becomes wrong and the device is unable to recover from that state, unless we reset it. But this "reset" dirty hack unfortunately is not an option for us, because of how a 3rd party host system is working, which is beyond our control. So we want to find a better software solution for this.

     This seems to be an unhandled event on others than EP0 (ie: forwarded to the class itself), but if this is corresponds to the "urb_function_sync_reset_pipe_and_clear_stall", it seemed to be accepted (ie: that its unhandled) on my end when I test.

    Should we provide more info on this, which will help to find out the exact behaviour and problem? E.g. additional logs?

    The DK has 4.7 uF on the VBUS:

    https://www.nordicsemi.com/-/media/Software-and-other-downloads/Dev-Kits/nRF52840-DK/nRF52840-Development-Kit---Hardware-files-2_0_1.zip

    If you are supplying the nRF with the +5V from the USB in addition, you are not only changing the "VBUS" going into the nRF, but the VDD_NRF as well. Are you also using REG0 output for supplying external devices?

    We use VBUS for power supply and we connect it to VDD_NRF as well. And we are not using REG0 output. 

    Also we have an update - we did a new test.

    1) We installed stabilization before VBUS, so on any input it outputs 5V and tried to reproduce the problem - eventually the device failed in the same state. 

    2) In addition to that we noticed that our device, based on the recommended schematics, has 56 Ohm resistors on D+ and D-. But the Devkit has 0 Ohm. So we removed the resistors, while step "1" was also in place. The problem didn't reproduce even if we lower the input voltage down to ~2.0V! CDC ACM perfectly works.

    We are curious, why the DevKit doesn't have the 56 Ohm resistors, but the reference design doesn't? Only 56 Ohms can be used? 

    Now we also want to test it without 5V stabilization and removed resistors, later I'll be able to provide results.  

    Also we can send our device to your office, where the problem will be 100% reproduced. If that makes sense for you. This is a really important problem for us, as it reproduces at our customer's venue. Hopefully it has a software solution.

    Kind regards,

    Oleh

  • Hi Oleh,

     

    I believe you found the root cause of your problem.

    There shall not be any series resistor on D+/D-. You should replace these with 0 ohms. Please let me know how the testing goes with no series resistance on D+/D-.

     

    Kind regards,

    Håkon

Related