This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

USB HID reliability issue on macOS

I am currently working on a project developing a USB HID product based on the nRF52840. I have run into an issue regarding USB communication reliability and am seeking the advice of others who know the Nordic products better.

The primary issue is that USB key events, using app_usbd_hid_kbd_key_control() are unreliable, with 1 - 2 % of packets failing to reach the host reliably. The issue is not hardware related.

Some background:

The product is built upon an nRF52840 and SDK 17.0.2. I have also updated to 17.1.0 and the issue outlined below remains the same.

The product firmware architecture uses a simple executive scheduler. This uses a 5 ms system tick derived from an application timer. Every 5ms a series of regular tasks are executed. During the idle time, whist waiting for the next 5ms tick, the unit services the USB event queue app_usbd_event_queue_process(). This aspect of code functionality is working as expected, with the 5 ms system tick being very regular and all other code in the product executing as expected.

During development all testing was done on Windows 10 VM. Testing was done on a Mac Mini running High Sierra and a range of Windows 7 and 10 machines and no issues were noted or observed. Everything worked fine.

The first internal beta tester connected the product to a MacBook Pro running Big Sur and started running into problems. A second beta tester using a MacBook Pro with Big Sur also reported product communication reliability problems. Beta testers using Windows machines reported no issues.

Testing on my side showed that the calls to app_usbd_hid_kbd_key_control() would occasionally return with the error code NRF_ERROR_BUSY, approximately once every 50 - 100 calls. This was true for Mac Mini's running Catalina or Big Sur. By contrast, a Mac Mini running High Sierra induced no errors.

No errors were observed on Windows machines, by either myself or beta testers, so the comments below pertain specifically to further testing on various MACs running macOS Big Sur or Catalina.

Narrowing in on the issue:

To rule out a firmware interaction, I pared everything back to the bare minimum of a shell application. The MCU booted up, initialised the processor and USB, then started a bare minimum application that attempted to send a single HID report every 100 ms. Irregularly, every few seconds, app_usbd_hid_kbd_key_control() returned  NRF_ERROR_BUSY.

To rule out hardware, I moved this very simple application over to the 52840DK. Irregularly, every few seconds, app_usbd_hid_kbd_key_control() returned  NRF_ERROR_BUSY. This was not a hardware issue.

The issue always manifested as the HID function returning as being busy. Sometimes the deferred USB HID command would be delayed by several hundred ms to the host but eventually succeed, whilst at other times the HID command would not be successfully received by the host at all.

Testing different compiler optimization levels and build configurations seemed to vary the error frequency, but not eradicate it.

Confusingly, inserting a single statement to print a message to the NRF log, immediately prior to the call to the Nordic USB library, causes it to perfectly work every time!

This was 100% reliable:

NRF_LOG_DEBUG("USB: Sending HID report");
ret = app_usbd_hid_kbd_key_control(&m_app_hid_kbd, i_keycode, i_press);

This was unreliable:

ret = app_usbd_hid_kbd_key_control(&m_app_hid_kbd, i_keycode, i_press);

Having a problem go away by including a logging statement is not a real solution. It would seem that it's merely masking some other low-level issue, so I was not satisfied with this as a fix. This kind of change has the potential to have the same problem re-emerge, so a real fix was needed.

A potential solution

Through much trial and error, I discovered that changing the SOF handling option appeared to solve the issue. I changed APP_USBD_CONFIG_SOF_HANDLING_MODE from 1 (Compress queue) to 2 (Interrupt).

// <i> Normal queue   - SOF events are pushed normally into the event queue.
// <i> Compress queue - SOF events are counted and binded with other events or executed when the queue is empty.
// <i>                  This prevents the queue from filling up with SOF events.
// <i> Interrupt      - SOF events are processed in interrupt.
// <0=> Normal queue 
// <1=> Compress queue 
// <2=> Interrupt 

#ifndef APP_USBD_CONFIG_SOF_HANDLING_MODE
#define APP_USBD_CONFIG_SOF_HANDLING_MODE 2
#endif

With this change in place the system became completely reliable across all platforms (Windows and various macOS).

I have my suspicions that there may be a low-level issue in the nrfx usb library related to this. Perhaps there's a circular buffer there with a non-volatile size or index variable... but I'm just speculating at this point.

Before pushing the change to the SOF handling out to beta testers, I'm keen to gain more insight into what my problem was, so that I can have more confidence that this change is actually a robust long-term fix for the issues I experienced. I'm hoping that someone in the community has encountered a very similar issue and can share some information to what the low-level issue actually is, endorse the fix, or suggest a better one.

Any knowledge or insights into this problem would be greatly appreciated.

Parents
  • Update:

    With the change so that only SOF events are processed within interrupts, and not queued, the problem was reduced, but not eliminated. It turns out my initial testing was insufficiently robust to catch all the conditions where the error would occur.

    A more robust fix seems to be to disable the event queue entirely. I assume this means that all USB messages are processed exclusively within interrupt context.

    With the USB event queue disabled I'm finally getting reliable message exchange with the various MACs under all tested scenarios:

    // <e> APP_USBD_CONFIG_EVENT_QUEUE_ENABLE - Enable event queue.
    
    // <i> This is the default configuration when all the events are placed into internal queue.
    // <i> Disable it when an external queue is used like app_scheduler or if you wish to process all events inside interrupts.
    // <i> Processing all events from the interrupt level adds requirement not to call any functions that modifies the USBD library state from the context higher than USB interrupt context.
    // <i> Functions that modify USBD state are functions for sleep, wakeup, start, stop, enable, and disable.
    //==========================================================
    #ifndef APP_USBD_CONFIG_EVENT_QUEUE_ENABLE
    #define APP_USBD_CONFIG_EVENT_QUEUE_ENABLE 0
    #endif

    Whilst this change seems to solve the issue, it makes me slightly uncomfortable. The USB examples all use the event queue so I'm wondering what the consequences of disabling it will be?

  • Hi,

    I'll check with our USB developers to see if they have some insight to share. Unfortunately, it might take some time for us to get back to you.

  • Understood, and thanks for considering this issue.

    The main thing I need to understand is whether there are likely to be any side-effects from handling all USBD events in interrupts compared to using the event queue. Practical testing has not turned up any issues so far.

  • Hi,

    I got some feedback on this. Could you try to increase the queue size (upto max 64)?

    Seems like running without the queue could cause issues particularly when SOF start appearing after power saving.

Reply Children
No Data
Related