ESB link in PRX mode requires re-initialization after minutes with nRF54L15

We are experiencing some ESB link failures on the receiver end of our system: nrf54L15.  After minutes to hours of use the receiver will stop getting data (we know the transmitter is still sending).  If we re-initialize esb subsystem everything is back to normal.  Unfortunately we have no good means to detect the link is down or know why. Its still in ESB_MODE_PRX we just are no longer getting any interrupts.  We have some timeouts on the configured `event_handler` but those are too long to be very useful.  Any advice you have to debug would be helpful but I also have two questions:

1. Is there some detailed documentation on how the ESB system/driver works on the nRF54L15? the driver code is difficult to understand on its own: https://github.com/nrfconnect/sdk-nrf/blob/main/subsys/esb/esb.c and the documentation page seems generic and high level (does not explain much of what the driver does): https://docs.nordicsemi.com/bundle/ncs-2.7.99-cs2/page/nrf/protocols/esb/index.html
2. Can you provide some ESB status register we might be able to monitor/read from to get more information beyond the three interrupt events and the mode that the ESB is in?

Thanks,

Galen

Parents
  • an update:
    after leaving it running for hours on revB and NCS 2.8.0, I was seeing the PRX stuck lockup requiring a reset. I slowed down the ESB communication frequency and it did not lock up during extended time testing.

    As an extra datapoint, I tested latest 54L15 on NCS 3.0.0 and was not able to replicate the failure, even at maximum transmission rate. I am still leaving this test running to see if it comes up.

    Any reason you are locked on v2.8.0 and revA chips?

  • That's good to hear you can re-produce. I can speak with the team if reducing the comm frequency is possible.  What freq did you find worked well? 

    We have RevA on our current custom boards and cannot replace until our next boards are built.  It would be great to know if this will be a problem for latest versions (Rev1 or Rev2?)... i appreciate your continued testing.

  • Hello,

    I've tested overnight on both SDK 2.8.0 and SDK 3.0.0 on rev 1 and rev B devices and the issue did not present itself overnight. This was at maximum communication frequency possible. (This to me rules out a subtle change in the SDK potentially being the culprit and points more at the HW).

    I do not think the slowed down comm frequency, which was one packet every 200 ms on the old chips, is acceptable for an ESB application -- it was slowed down due to invasive debugging in the ESB library to see what the radio state was when it locked up (but it never locked up even in 14+ hours at the slower rate) 

    Your stop gap seems acceptable until you can get back on the latest devices, but if it winds up giving grief you may need to restart the radio, I am not sure what is problematic on the old device. 

    Before your next boards are built, could you mock hardware specific functions that are for your board and just validate the ESB side with dummy data on a current DK?

    The only other divergence between our test setups is potentially what you state about sending messages back to the PTX other than acks -- to clarify: are you flipping their roles for this, or putting payloads in the ack?

    Best regards,

Reply
  • Hello,

    I've tested overnight on both SDK 2.8.0 and SDK 3.0.0 on rev 1 and rev B devices and the issue did not present itself overnight. This was at maximum communication frequency possible. (This to me rules out a subtle change in the SDK potentially being the culprit and points more at the HW).

    I do not think the slowed down comm frequency, which was one packet every 200 ms on the old chips, is acceptable for an ESB application -- it was slowed down due to invasive debugging in the ESB library to see what the radio state was when it locked up (but it never locked up even in 14+ hours at the slower rate) 

    Your stop gap seems acceptable until you can get back on the latest devices, but if it winds up giving grief you may need to restart the radio, I am not sure what is problematic on the old device. 

    Before your next boards are built, could you mock hardware specific functions that are for your board and just validate the ESB side with dummy data on a current DK?

    The only other divergence between our test setups is potentially what you state about sending messages back to the PTX other than acks -- to clarify: are you flipping their roles for this, or putting payloads in the ack?

    Best regards,

Children
Related