ESB link in PRX mode requires re-initialization after minutes with nRF54L15

We are experiencing some ESB link failures on the receiver end of our system: nrf54L15.  After minutes to hours of use the receiver will stop getting data (we know the transmitter is still sending).  If we re-initialize esb subsystem everything is back to normal.  Unfortunately we have no good means to detect the link is down or know why. Its still in ESB_MODE_PRX we just are no longer getting any interrupts.  We have some timeouts on the configured `event_handler` but those are too long to be very useful.  Any advice you have to debug would be helpful but I also have two questions:

1. Is there some detailed documentation on how the ESB system/driver works on the nRF54L15? the driver code is difficult to understand on its own: https://github.com/nrfconnect/sdk-nrf/blob/main/subsys/esb/esb.c and the documentation page seems generic and high level (does not explain much of what the driver does): https://docs.nordicsemi.com/bundle/ncs-2.7.99-cs2/page/nrf/protocols/esb/index.html
2. Can you provide some ESB status register we might be able to monitor/read from to get more information beyond the three interrupt events and the mode that the ESB is in?

Thanks,

Galen

Parents
  • Hello,

    Thanks for letting me know. 

    1- Out of curiosity, if you use TASKS_CONSTLAT, does the failure still present itself? You should be using CONSTLAT when the radio is enabled regardless. (You can use the accompanying TASKS_LOWPOWER when you are through with it, but for exploring the failure case let's keep it enabled.)

    2- Is the PRX always a PRX?

    Amanda touched on some useful debugging tools and info, you can also use PPI to observe the radio activity on IO pin toggles.

    (Some examples: Add alternative to radio library ppi pair for something more flexible. · Issue #5 · droidecahedron/esb_multihttps://github.com/droidecahedron/esb_multi/blob/68694693cb49272320ade3253a0546d0029f6535/esb_prx_blefallback/src/main.c#L23-L49)

    but it would mostly just show you when the radio is active/inactive so you could see missed packets and connection events.

    And for more information around ESB, I find the following insightful:

    Old nRF24 user guide: nRF24LE1 Product Specification v1.6

    Blog by my colleague: (+) Intro to ShockBurst/Enhanced ShockBurst - Blogs - Nordic Blog - Nordic DevZone

    Said colleague's FOSS example project: inductivekickback/rc_radio: Remote Control Radio library for nRF52 using Enhanced ShockBurst.

    Please let me know what you find as you collect more logs around the failure case.

    I am working on replicating it on my end -- but am unable to do so reliably. Is the only thing that causes this lock up just a long period of receiving packets?

    As an extra datapoint, could you try current 54L15DKs and NCS v3.0.0 stock PRX/PTX operation and see if the failure persists there as well?

    BR,

  • 1 - we are using CONFIG_SOC_NRF_FORCE_CONSTLAT already
    2 - well the PRX is always the PRX but we do send some messages back to the PTX other then ack's. We are receiving transmissions at 12Hz and replying at 1Hz (from the problematic PRX)

    One of my colleagues looked into migrating to v3.0.0 but was having some issues getting things to run on the EngA silicon (what is on our boards).  I went through it and cherry picked things that looked like they might have an effect, specifically: https://github.com/nrfconnect/sdk-nrf/commit/be1549932bd0200a7d2da4c61e2c7b8132d514d2 (didn't improve anything really).

    Lastly what i did do as a hack to improve things after noticing it was hung at the ack state is add the following:

    static void on_radio_disabled_rx_ack(void)
    {
    	esb_fem_for_ack_rx();
    
    	if (IS_ENABLED(CONFIG_ESB_FAST_SWITCHING)) {
    		nrf_radio_shorts_set(NRF_RADIO, radio_shorts_common);
    		nrf_radio_task_trigger(NRF_RADIO, NRF_RADIO_TASK_RXEN);
    	} else {
    		nrf_radio_shorts_set(NRF_RADIO, (radio_shorts_common |
    						 NRF_RADIO_SHORT_DISABLED_TXEN_MASK));
    	}
    
    	update_rf_payload_format(esb_cfg.payload_length);
    
    	nrf_radio_packetptr_set(NRF_RADIO, rx_payload_buffer);
    	on_radio_disabled = on_radio_disabled_rx;
    
    	esb_state = ESB_STATE_PRX;
    
    #if defined(CONFIG_ESB_USE_PRX_ACK_TIMEOUT)
    
    #if IS_EMPTY(CONFIG_ESB_PRX_ACK_TIMEOUT_US)
    #error "No ESB_PRX_ACK_TIMEOUT_US provided but ESB_USE_PRX_ACK_TIMEOUT enabled"
    #endif
    
    	/* Configure timer to produce an ISR after retransmit_delay */
    	nrf_timer_task_trigger(esb_timer.p_reg, NRF_TIMER_TASK_CLEAR);
    	nrfx_timer_clear(&esb_timer);
    	nrfx_timer_compare(&esb_timer, NRF_TIMER_CC_CHANNEL1,
    		CONFIG_ESB_PRX_ACK_TIMEOUT_US, true);
    
    	/* Configure PPI to start the timer when transmission ends */
    	esb_ppi_for_wait_for_rx_set();
    
    	nrf_timer_event_clear(esb_timer.p_reg, NRF_TIMER_EVENT_COMPARE1);
    
    	on_timer_compare1 = on_timeout;
    
    #endif
    
    }
    
    #if defined(CONFIG_ESB_USE_PRX_ACK_TIMEOUT)
    static void on_timeout(){
    	LOG_ERR("Timeout function called on_timeout");
    	nrf_timer_int_disable(esb_timer.p_reg,  nrf_timer_compare_int_get(NRF_TIMER_CC_CHANNEL1));
    	esb_ppi_for_wait_for_rx_clear();
    
    	clear_events_restart_rx();
    	LOG_ERR("Timeout function called on_timeout return");
    }
    #endif


    For the most part this seems to be working to add a timeout to that state that will clear the events and restart rx via `clear_events_restart_rx()`.  If you had any advice on any unforeseen implications that might occur here that would be very helpful. Additionally I'm not sure if there are other states the system will hang id need to add something like this too, but ill cross that bridge when i get there.  Otherwise for now this seems like a good stopgap until we can migrate to v3.0.0 and Rev1/2 on our next spin of boards.

Reply
  • 1 - we are using CONFIG_SOC_NRF_FORCE_CONSTLAT already
    2 - well the PRX is always the PRX but we do send some messages back to the PTX other then ack's. We are receiving transmissions at 12Hz and replying at 1Hz (from the problematic PRX)

    One of my colleagues looked into migrating to v3.0.0 but was having some issues getting things to run on the EngA silicon (what is on our boards).  I went through it and cherry picked things that looked like they might have an effect, specifically: https://github.com/nrfconnect/sdk-nrf/commit/be1549932bd0200a7d2da4c61e2c7b8132d514d2 (didn't improve anything really).

    Lastly what i did do as a hack to improve things after noticing it was hung at the ack state is add the following:

    static void on_radio_disabled_rx_ack(void)
    {
    	esb_fem_for_ack_rx();
    
    	if (IS_ENABLED(CONFIG_ESB_FAST_SWITCHING)) {
    		nrf_radio_shorts_set(NRF_RADIO, radio_shorts_common);
    		nrf_radio_task_trigger(NRF_RADIO, NRF_RADIO_TASK_RXEN);
    	} else {
    		nrf_radio_shorts_set(NRF_RADIO, (radio_shorts_common |
    						 NRF_RADIO_SHORT_DISABLED_TXEN_MASK));
    	}
    
    	update_rf_payload_format(esb_cfg.payload_length);
    
    	nrf_radio_packetptr_set(NRF_RADIO, rx_payload_buffer);
    	on_radio_disabled = on_radio_disabled_rx;
    
    	esb_state = ESB_STATE_PRX;
    
    #if defined(CONFIG_ESB_USE_PRX_ACK_TIMEOUT)
    
    #if IS_EMPTY(CONFIG_ESB_PRX_ACK_TIMEOUT_US)
    #error "No ESB_PRX_ACK_TIMEOUT_US provided but ESB_USE_PRX_ACK_TIMEOUT enabled"
    #endif
    
    	/* Configure timer to produce an ISR after retransmit_delay */
    	nrf_timer_task_trigger(esb_timer.p_reg, NRF_TIMER_TASK_CLEAR);
    	nrfx_timer_clear(&esb_timer);
    	nrfx_timer_compare(&esb_timer, NRF_TIMER_CC_CHANNEL1,
    		CONFIG_ESB_PRX_ACK_TIMEOUT_US, true);
    
    	/* Configure PPI to start the timer when transmission ends */
    	esb_ppi_for_wait_for_rx_set();
    
    	nrf_timer_event_clear(esb_timer.p_reg, NRF_TIMER_EVENT_COMPARE1);
    
    	on_timer_compare1 = on_timeout;
    
    #endif
    
    }
    
    #if defined(CONFIG_ESB_USE_PRX_ACK_TIMEOUT)
    static void on_timeout(){
    	LOG_ERR("Timeout function called on_timeout");
    	nrf_timer_int_disable(esb_timer.p_reg,  nrf_timer_compare_int_get(NRF_TIMER_CC_CHANNEL1));
    	esb_ppi_for_wait_for_rx_clear();
    
    	clear_events_restart_rx();
    	LOG_ERR("Timeout function called on_timeout return");
    }
    #endif


    For the most part this seems to be working to add a timeout to that state that will clear the events and restart rx via `clear_events_restart_rx()`.  If you had any advice on any unforeseen implications that might occur here that would be very helpful. Additionally I'm not sure if there are other states the system will hang id need to add something like this too, but ill cross that bridge when i get there.  Otherwise for now this seems like a good stopgap until we can migrate to v3.0.0 and Rev1/2 on our next spin of boards.

Children
No Data
Related