This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

CDC ACM transmission(tx) is broken on Mac OS Catalina

Hello, 

we are using nrf5 SDK version is 17.0.2 and nrf52840.

Our product uses nrf52840 and its USB capabilities being configured as a composite device (HID keyboard + CDC ACM). The CDC ACM is used for a custom text protocol data transmission between the device and the PC. It's correctly working on Linux (Ubuntu) and Windows 10, but not on Mac OS X Catalina (10.15.6). We are not sure it's related only to Mac OS. The problem is that during data transmission the TX line hangs forever in NRF_ERROR_BUSY, which happens accidentally and the device can't recover that error unless "COM port" being reopened on the host machine. Meanwhile, RX continues working correctly. The general approach for sending data is the following (pseudo-code):

// Handling cdc acm events:

...
  case APP_USBD_CDC_ACM_USER_EVT_RX_DONE:
    NRF_LOG_INFO("APP_USBD_CDC_ACM_USER_EVT_RX_DONE");
    process_incoming_data();
    break;
  case APP_USBD_CDC_ACM_USER_EVT_TX_DONE:
    NRF_LOG_INFO("APP_USBD_CDC_ACM_USER_EVT_TX_DONE");
    tx_done = true;
    tx_buffer_fill_size = 0;
    break;
...


// main():
...
for (;;) {

  if (tx_done && tx_buffer_fill_size != 0) {
    ret_code_t ret = app_usbd_cdc_acm_write(&m_app_cdc_acm, tx_buffer, tx_buffer_fill_size);
    if (ret == NRF_ERROR_BUSY) {
        NRF_LOG_ERROR("Resource is busy");
    } else if (ret == NRF_SUCCESS) {
        NRF_LOG_INFO("cdc_acm_write start");
    } else {
        NRF_LOG_ERROR("cdc_acm_write error");
    }
  }

nrf_pwr_mgmt_run();
}
...

Typical logs could be the following:


APP_USBD_CDC_ACM_USER_EVT_RX_DONE
cdc_acm_write start
APP_USBD_CDC_ACM_USER_EVT_TX_DONE
APP_USBD_CDC_ACM_USER_EVT_RX_DONE
cdc_acm_write start
APP_USBD_CDC_ACM_USER_EVT_TX_DONE
APP_USBD_CDC_ACM_USER_EVT_RX_DONE
cdc_acm_write start
APP_USBD_CDC_ACM_USER_EVT_TX_DONE
APP_USBD_CDC_ACM_USER_EVT_RX_DONE
cdc_acm_write start
APP_USBD_CDC_ACM_USER_EVT_RX_DONE
Resource is busy
APP_USBD_CDC_ACM_USER_EVT_RX_DONE
Resource is busy

and so on...

The problem is that the device couldn't recover from this state unless the port is being re-opened. On the host system, incoming data becomes lost at the same time when 

app_usbd_cdc_acm_write stops producing APP_USBD_CDC_ACM_USER_EVT_TX_DONE events. We also tried to send "zero-length" packets right after sending packet with data, but it doesn't help.

Some important constants in sdk_config.h:

NRF_LOG_DEFERRED 1

APP_USBD_CDC_ACM_CONFIG_LOG_ENABLED 1

APP_USBD_CDC_ACM_ZLP_ON_EPSIZE_WRITE 1


1) Is it possible to recover from the such situation?
2) Shouldn't nrf SDK produce error in a callback in the such situation?
3) Is it possible to debug this problem on a lower level and find an exact reason of the fault? CDC ACM logging doesn't provide enough of useful information.
4) Any other suggestion why it might happend and how to fix this?
Parents
  • Hi.

    This is a known issue for us for this specific combination.

    Following is a patch for the issue. Please confirm the patch solves the issue.

    diff --git a/sdk/nrf5/components/libraries/usbd/class/hid/app_usbd_hid.c b/sdk/nrf5/components/libraries/usbd/class/hid/app_usbd_hid.c
    index 68dc300..51fe575 100644
    --- a/sdk/nrf5/components/libraries/usbd/class/hid/app_usbd_hid.c
    +++ b/sdk/nrf5/components/libraries/usbd/class/hid/app_usbd_hid.c
    @@ -74,11 +74,6 @@ static uint16_t hid_sof_required(app_usbd_hid_ctx_t * p_hid_ctx, uint16_t framec
         return APP_USBD_HID_SOF_NOT_REQ_FLAG;
     }
     
    -static bool hid_idle_on(app_usbd_class_inst_t const * p_inst, app_usbd_hid_ctx_t * p_hid_ctx)
    -{
    -    return p_hid_ctx->idle_on;
    -}
    -
     /**
      * @brief User event handler.
      *
    @@ -405,11 +400,6 @@ static ret_code_t endpoint_in_event_handler(app_usbd_class_inst_t const * p_inst
             return NRF_SUCCESS;
         }
     
    -    if (!hid_idle_on(p_inst, p_hid_ctx))
    -    {
    -        return p_hinst->p_hid_methods->ep_transfer_in(p_inst);
    -    }
    -    else
         {
             uint8_t i = 0;
             for(i=0; i < APP_USBD_HID_REPORT_IDLE_TABLE_SIZE; i++)
    @@ -425,8 +415,11 @@ static ret_code_t endpoint_in_event_handler(app_usbd_class_inst_t const * p_inst
                     break;
                 }
             }
    -        return NRF_SUCCESS;
         }
    +
    +    /* HID 1.11 specification states that in case the report has changed, 
    +       it should be transfered immediately  even when idle is enabled. */
    +    return p_hinst->p_hid_methods->ep_transfer_in(p_inst);
     }

    Br,
    Joakim

  • Hello Joakim,

    thank you for the suggestion. But as I mentioned we are using SDK 17.0.2 and I checked the app_usbd_hid.c file and it already contains the changes you have suggested.

    So, no this patch doesn't solve the issue Disappointed

    Kind regards,

    Oleh

  • Oh sorry! I wasn't aware that the fix was added in the new SDK.

    Let me take a closer look, and see if I can find a solution to this :)

    Br,
    Joakim

  • Hi.

    I've been looking at this, but I'm unable to reproduce the issue. So I can't debug it to find the reason behind the NRF_ERROR_BUSY.

    Have you tried debugging yourself, to find where exactly the error code is returned from?

    Maybe you have a minimal project that can be run from on my end, so that I can take a closer look?

    I haven't been able to reproduce using our examples, so I'm guessing that it could be something with your implementation.

    Br,
    Joakim

  • Joakim, thank you so much for your help and deep investigation of this. 

    We also tried to dig deeper and I tried to provide a minimum source code to reproduce the probelm. But eventually I've updated to 10.15.7 and my colleagues too. And it seems that the problem disappeared and it was somehow related to OS X, not Nordic\our implementation. However, other CDC ACM devices (from different vendors) worked perfectly on 10.15.6 that's why we considered it's an issue in our code or SoftDevice.

    In any case it would be great if USB implementation could throw some kind of error in case it's stuck in NRF_BUSY forever, so the device could somehow recover or this problem could be handled by the code. However, I understand, that it might be almost impossible without knowing the cause of such possibilities.

    We are going to test on more versions in different environments to ensure this problem is not reproducing. Once we are done I'm going to provide an answer for this topic.

    Kind regards,

    Oleh

Reply
  • Joakim, thank you so much for your help and deep investigation of this. 

    We also tried to dig deeper and I tried to provide a minimum source code to reproduce the probelm. But eventually I've updated to 10.15.7 and my colleagues too. And it seems that the problem disappeared and it was somehow related to OS X, not Nordic\our implementation. However, other CDC ACM devices (from different vendors) worked perfectly on 10.15.6 that's why we considered it's an issue in our code or SoftDevice.

    In any case it would be great if USB implementation could throw some kind of error in case it's stuck in NRF_BUSY forever, so the device could somehow recover or this problem could be handled by the code. However, I understand, that it might be almost impossible without knowing the cause of such possibilities.

    We are going to test on more versions in different environments to ensure this problem is not reproducing. Once we are done I'm going to provide an answer for this topic.

    Kind regards,

    Oleh

Children
No Data
Related