ESB running in MPSL overstays timeslot when attaching or detaching logger

I'm testing concurrent ESB and BLE with the MPSL Timeslot API on a seeed xiao nrf52840. I've created a minimal reproduction based on the `esb_prx_ble` sample from the nrf sdk, which flash to the device using `probe-rs`. When ESB is actively receiving and I attach a logger or disconnect it, I immediately get a panic with signal 10, indicating `MPSL_TIMESLOT_SIGNAL_OVERSTAYED`.

If I add a sleep at boot and attach the logger during that period, the crash does not happen. With bluetooth enabled, the crash does not happen 100% of the time anymore, I suspect that it doesn't occur if ESB does not hold a timeslot while the logger attaches.

I've attached my source stripped down as much as possible. I have 2 microcontrollers, one is transmitting using the unmodified `esb_ptx` sample, the other is running the code below.

#include <zephyr/kernel.h>
#include <zephyr/logging/log.h>
#include <dk_buttons_and_leds.h>
#include <esb.h>

LOG_MODULE_REGISTER(esb_prx_ble, CONFIG_ESB_PRX_BLE_LOG_LEVEL);

static struct esb_payload rx_payload;
static struct esb_payload tx_payload = ESB_CREATE_PAYLOAD(0,
	0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17);

void event_handler(struct esb_evt const *event) {
	int err;

	switch (event->evt_id) {
	case ESB_EVENT_TX_SUCCESS:
	case ESB_EVENT_TX_FAILED:
		break;
	case ESB_EVENT_RX_RECEIVED:
		while ((err = esb_read_rx_payload(&rx_payload)) == 0) 
			LOG_DBG("Packet received, len %d", rx_payload.length);
		if (err && err != -ENODATA) 
			LOG_ERR("Error while reading rx packet (err=%d)", err);
		break;
	case ESB_EVENT_TIMESLOT_FAILED:
		LOG_ERR("Timeslot failed");
		break;
	}
}

int esb_initialize(void) {
	int err;
	const uint8_t base_addr_0[4] = {0xE7, 0xE7, 0xE7, 0xE7};
	const uint8_t base_addr_1[4] = {0xC2, 0xC2, 0xC2, 0xC2};
	const uint8_t addr_prefix[8] = {0xE7, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7, 0xC8};

	struct esb_config config = ESB_DEFAULT_CONFIG;
	config.protocol = ESB_PROTOCOL_ESB_DPL;
	config.bitrate = ESB_BITRATE_2MBPS;
	config.mode = ESB_MODE_PRX;
	config.event_handler = event_handler;
	config.selective_auto_ack = true;
	config.use_fast_ramp_up = true;

	err = esb_init(&config);
	if (err) return err;
	err = esb_set_base_address_0(base_addr_0);
	if (err) return err;
	err = esb_set_base_address_1(base_addr_1);
	if (err) return err;
	err = esb_set_prefixes(addr_prefix, ARRAY_SIZE(addr_prefix));
	if (err) return err;

	return 0;
}

int main(void) {
	int err;

	LOG_INF("Starting ESB prx");
	err = esb_initialize();
	if (err) return 0;

	err = esb_write_payload(&tx_payload);
	if (err) return 0;

	err = esb_start_rx();
	if (err) return 0;

	return 0;
}

CONFIG_NCS_SAMPLES_DEFAULTS=y
CONFIG_ESB=y
CONFIG_CLOCK_CONTROL=y
CONFIG_ESB_CLOCK_INIT=y
CONFIG_ESB_MPSL_TIMESLOT=y
CONFIG_MPSL_TIMESLOT_SESSION_COUNT=1
CONFIG_ESB_PRX_BLE_LOG_LEVEL_DBG=y

Parents
  • Your event_handler is called from the ESB software IRQ, which runs inside the MPSL timeslot. That slot has a hard time budget. The LOG_DBG call inside the ESB_EVENT_RX_RECEIVED branch does work in IRQ context.

    Fix would be to move all logging and packet handling out of event_handler into a thread or workqueue, and only do the esb_read_rx_payload() drain plus an enqueue in the callback.

Reply
  • Your event_handler is called from the ESB software IRQ, which runs inside the MPSL timeslot. That slot has a hard time budget. The LOG_DBG call inside the ESB_EVENT_RX_RECEIVED branch does work in IRQ context.

    Fix would be to move all logging and packet handling out of event_handler into a thread or workqueue, and only do the esb_read_rx_payload() drain plus an enqueue in the callback.

Children
  • Thank you for the quick response!

    Even with the debug log removed, the behavior stays the exact same. Attaching the logger to this code also crashes:

    switch (event->evt_id) {
    case ESB_EVENT_TX_SUCCESS:
    case ESB_EVENT_TX_FAILED:
    	break;
    case ESB_EVENT_RX_RECEIVED:
    	while ((err = esb_read_rx_payload(&rx_payload)) == 0) continue;
    	if (err && err != -ENODATA) 
    		LOG_ERR("Error while reading rx packet (err=%d)", err);
    	break;
    case ESB_EVENT_TIMESLOT_FAILED:
    	LOG_ERR("Timeslot failed");
    	break;
    }

    So, I don't think I'm actually exceeding the time budget, at least not in my own code. Here's what I'm running:

    // Build the firmware
    west build -b xiao_ble/nrf52840 -d build/esb_prx_ble samples/esb_prx_ble --no-sysbuild
    
    // Flash and run it, this succeeds
    probe-rs run --chip nRF52840_xxAA firmware/esb_prx_ble-xiao_ble.elf
    
    // Cancel it and wait a few seconds, then reattach
    probe-rs attach firmware/esb_prx_ble-xiao_ble.elf
    
    // This instantly causes the following panic:
    <err> esb: Unexpected MPSL Timeslot signal 10
    <err> mpsl_init: MPSL ASSERT: 106, 182
    <err> os: ***** HARD FAULT *****
    <err> os:   Fault escalation (see below)
    <err> os: ARCH_EXCEPT with reason 3
    
    <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000002
    <err> os: r3/a4:  0x00000003 r12/ip:  0x00000000 r14/lr:  0x0000e671
    <err> os:  xpsr:  0x0100001b
    <err> os: Faulting instruction address (r15/pc): 0x0001b48a
    <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    <err> os: Fault during interrupt handling
    
    <err> os: Current thread: 0x20002bf0 (unknown)
    <err> fatal_error: Resetting system

Related