USBD endpoint transfer completion event.

In my NRF52840 based device using NRF5 SDK 17.1.0, I notice that sometimes this loop in the nrfx usbd driver code of the SDK doesn't finish:

[modules/nrfx/drivers/src/nrfx_usbd.c: 1444]

/* There is a lot of USBD registers that cannot be accessed during EasyDMA transfer.
* This is quick fix to maintain stability of the stack.
* It cost some performance but makes stack stable. */
while (!nrf_usbd_event_check(nrfx_usbd_ep_to_endevent(ep)) &&
!nrf_usbd_event_check(NRF_USBD_EVENT_USBRESET))
{
/* Empty */
}

It seems the transfer-end EasyDMA hardware-event isn't generated reliably in all situations. This happens mostly with small transfers on interrupt-in-endpoints and only with some usb hosts (notably raspberry pi). This is a big problem, because this function is called from interrupt context (and therefore should actually not do such busy waiting).

Is this a known issue and is there a workaround? I tried to enter the loop conditionally only when transfer size is bigger than 4 or 8 bytes, which seems to fix this, but that causes other problems. I tried to pad the 2-byte transfer to 8 bytes, but that doesn't change anything. I think it could work to skip the loop for interrupt endpoints, but this routine handles interrupt and bulk endpoints the same; there is no USBD register that keeps track of bulk vs. interrupt endpoints.

I know that NRF5 sdk is out of support, but my project was built with it and it isn't easy to move to NRF-Connect SDK.

Top Replies

Parents

0 Ronald Hoogenboom 9 months ago

I have found the reason for the issue. Look at the comments on the function usbd_dma_pending_set:

/**
* @brief Mark that EasyDMA is working.
*
* Internal function to set the flag informing about EasyDMA transfer pending.
* This function is called always just after the EasyDMA transfer is started.
*/
static inline void usbd_dma_pending_set(void)
{
if (nrfx_usbd_errata_199())
{
*((volatile uint32_t *)0x40027C1C) = 0x00000082;
}
m_dma_pending = true;
}

It says that this function is (to be) called just AFTER the DMA is started, but in the function usbd_dmareq_process, it is called a little BEFORE the DMA is started. This isn't a problem for setting the value of m_dma_pending (of course) but it is a problem for the errata 199 workaround.

I moved the call to the pending_set function from line 1429 to right after usbd_dma_start(ep), line 1443 and it solves the issue with raspberry pi. It also still works fine with the windows laptop.

I do think this should be rectified in the NRF5 SDK and in the NRF Connect SDK (sdk-hal_nordic/nrfx/drivers/src/nrfx_usbd.c).
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui 9 months ago in reply to Ronald Hoogenboom

Hi Ronald,
Thanks for the report. I have forwarded your finding internally. I will keep you updated when I get a feedback.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui 9 months ago in reply to Ronald Hoogenboom

Hi Ronald,
So from my understanding, it's the issue with the host that it doesn't send IN requests. But I still don't understand why that would cause the driver to end in a deadloop as in your initial question. Would that because the host is unresponsive and leading to END event never come ?
Maybe we can safeguard this loop by adding a way to exit it if it stuck there for a while ?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Ronald Hoogenboom 9 months ago in reply to Hung Bui

Hi Hung Bui,

There are just no IN tokens coming from the host, because it didn't listen for the interrupts from the endpoint yet. This apparently results in the DMA for the triggered endpoint task to never 'capture' (no STARTED event) and thus no END event either (which is what the loop in the driver is waiting for).

I think it's better to check a few times for the STARTED event (which should come quickly, as all DMA requests are serialized by the driver software already) than have a longer timeout on the END event. ISRs are timing critical, because they block all other parts of the code. I have seen that the STARTED event always comes within 5 loops, so timing out for the STARTED event can be done with as little as 10 loops or so. Then when the STARTED loop times out, it's useless to wait for END. I'm not sure if the triggered endpoint task needs to be canceled in the timeout case. I didn't do that and it seems to not harm anything.

The fragment in the nrfx_usbd.c around line 1440 looks like this now:

/* Start transfer to the endpoint buffer */
nrf_usbd_ep_easydma_set(ep, transfer.p_data.addr, (uint32_t)transfer.size);

usbd_dma_start(ep);
for (int i=0; ; i++)
{
if (nrf_usbd_event_check(NRF_USBD_EVENT_STARTED)) break;
if (i>8)
{
if (NRFX_USBD_DMAREQ_PROCESS_DEBUG)
{
NRFX_LOG_DEBUG("USB DMA process - not started");
}
/* Transfer won't capture - abort */
return;
}
}
usbd_dma_pending_set();

/* There is a lot of USBD registers that cannot be accessed during EasyDMA transfer.
* This is quick fix to maintain stability of the stack.

This seems to make the behavior stable for both when the host sends IN tokens to the interrupt endpoint and when it doesn't.

What do you think?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui 9 months ago in reply to Ronald Hoogenboom

Hi Ronald,

I would need to check internally to see if this can cause any potential issue. I will get back when I have a reply.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Ronald Hoogenboom 7 months ago in reply to Hung Bui

Is there any progress on this?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui 7 months ago in reply to Ronald Hoogenboom

Hi Ronald,
I'm really sorry for the late response. It's my mistake that the feedback from our R&D engineer didn't get back here. Here is his thought:

STARTED event is completely independent from host issuing the IN token. The host may not issue IN token at all, and yet the STARTED event will be generated.

The customer seems to be chasing something, but the conclusion that this in any way allows to determine whether IN token is sent by host is wrong. It does not, never did and never will. Can he observe the issue with Zephyr or NCS?

By the way, the Zephyr usbd driver no longer does busy loop waiting in the interrupt context. If the customer can reproduce it with Zephyr or NCS, I would be very interesting in investigating this. Otherwise, I just assume there's some bug somewhere in the NRF5 SDK or the customer application and not in the silicon.

Without a reproducer (full software package both on nRF52840 and host) I cannot check what's going on.

In the ticket customer was asking "What could be the reason why a DMA action isn't captured? Could it be because a previous DMA isn't finished yet?" - there can only be one DMA transfer (between main system memory and USBD endpoint buffer) active at a time. If the previous one hasn't finished before new one is started then for sure things won't work. But considering that nrfx does busy loop until the transfer ends, I don't think it is possible to trigger next DMA before previous one finished with nrfx.

A long shot at tackling this problem could be to try to remove the *((volatile uint32_t *)0x40027C1C) = 0x00000000; from usbd_dma_pending_clear() completely. This would result in increased current consumption, but could provide some insight whether or not the issue is related to errata 199 or not.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Hung Bui 7 months ago in reply to Ronald Hoogenboom

Hi Ronald,
I'm really sorry for the late response. It's my mistake that the feedback from our R&D engineer didn't get back here. Here is his thought:

STARTED event is completely independent from host issuing the IN token. The host may not issue IN token at all, and yet the STARTED event will be generated.

The customer seems to be chasing something, but the conclusion that this in any way allows to determine whether IN token is sent by host is wrong. It does not, never did and never will. Can he observe the issue with Zephyr or NCS?

By the way, the Zephyr usbd driver no longer does busy loop waiting in the interrupt context. If the customer can reproduce it with Zephyr or NCS, I would be very interesting in investigating this. Otherwise, I just assume there's some bug somewhere in the NRF5 SDK or the customer application and not in the silicon.

Without a reproducer (full software package both on nRF52840 and host) I cannot check what's going on.

In the ticket customer was asking "What could be the reason why a DMA action isn't captured? Could it be because a previous DMA isn't finished yet?" - there can only be one DMA transfer (between main system memory and USBD endpoint buffer) active at a time. If the previous one hasn't finished before new one is started then for sure things won't work. But considering that nrfx does busy loop until the transfer ends, I don't think it is possible to trigger next DMA before previous one finished with nrfx.

A long shot at tackling this problem could be to try to remove the *((volatile uint32_t *)0x40027C1C) = 0x00000000; from usbd_dma_pending_clear() completely. This would result in increased current consumption, but could provide some insight whether or not the issue is related to errata 199 or not.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Ronald Hoogenboom 7 months ago in reply to Hung Bui

Thanks for the feedback. It is weird that his conclusion is not aligning with what I experience. For me, the hardware is a black box. All I have is whatever information Nordic provides and the inconsistencies I can detect in it. Anyway, my modification seems to solve my issue, I haven't had any issues with usbd since.

"the Zephyr usbd driver no longer does busy loop waiting in the interrupt context"

What I see in sdk-hal_nordic/nrfx/drivers/src/nrfx_usbd.c at ncs_2_7_0_system_nrf54l_mdk_8_67_fix · nrfconnect/sdk-hal_nordic (github.com) , there is still the busy wait loop, exactly the same as in NRF5 SDK. Maybe this is not part of NCS? Can you tell me where the nrfx_usbd driver for NCS is? I would be happy to transplant the newer code in my NRF5 SDK tree and get rid of the busy wait in interrupt context.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hung Bui 7 months ago in reply to Ronald Hoogenboom

Hi Ronald,
The legacy driver you pointed to will be retired.
You can find the current driver here, without the loop:
https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/usb/common/nrf_usbd_common/nrf_usbd_common.c#L897C4-L904C20
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel