MPSL Assert in project involving BLE, async UART and Nordic Distance Measurement Library

We are developing a project that runs BLE, the Nordic Uart Service, the Nordic Distance Measurement toolikt and Async UART (NCS v2.9.0). The Peripheral device shall advertise (connectable) the Nordic Uart Service UUID, the Central Scans for this and connects to the found Peripheral, and registers the NUS Service. After the connection it starts/changes the required advertising-scanning for the distance measurement, as in the example provided from the SDK nrf_dm (we adapted only the part of Distance Estimation, and not the DDFS service or the Azimuth simulation). So only the pair advertising-scanning with the corresponding manufacturer data values, and the call to dm_request_add in the callback to the advertising scanned and scan filter match.

It works as expected, but sometime after the measurement has been running, an assert is triggered:

D: Sending Uart State
D: Idle State
W: Dist measurement updated: 2.050781
D: Sending Ble State
D: Idle State
D: Sending Uart State
D: Idle State
D: Sending Ble State
D: Idle State
D: Sending Uart State
D: Idle State
D: Sending Ble State
D: Idle State
D: Sending Uart State
D: Idle State
D: Sending Ble State
D: Idle State
D: Sending Uart State
D: Idle State
D: Sending Ble State
D: Idle State
D: Sending Uart State
D: Idle State
D: Sending Ble State
D: Idle State
ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/mpsl/init/mpsl_init.c:299
  MPSL ASSERT: 106, 520

E: ***** HARD FAULT *****
E:   Fault escalation (see below)
E: ARCH_EXCEPT with reason 4

E: r0/a1:  0x00000004  r1/a2:  0x0000012b  r2/a3:  0x00000001
E: r3/a4:  0x00000004 r12/ip:  0x00000000 r14/lr:  0x000357e3
E:  xpsr:  0x21000018
E: s[ 0]:  0x00000008  s[ 1]:  0x00000000  s[ 2]:  0x00000000  s[ 3]:  0x000288bb
E: s[ 4]:  0x00020000  s[ 5]:  0x0002887d  s[ 6]:  0x00000000  s[ 7]:  0x00000000
E: s[ 8]:  0x00000000  s[ 9]:  0x0004007d  s[10]:  0x00000000  s[11]:  0x2000c36c
E: s[12]:  0x00000000  s[13]:  0x000357d9  s[14]:  0x0004d2e0  s[15]:  0x2000c384
E: fpscr:  0x00000208
E: Faulting instruction address (r15/pc): 0x00040068
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Fault during interrupt handling

E: Current thread: 0x20009cc8 (idle)
E: Halting system


As fas as we can understand, the MPSL is being used in the DM toolkit (or BLE, or both?), but we do not interact directly with it. So we are unsure what the problem might be. From other posts here we might interpret that it might be a timing problem with an interrupt, but we are not directly stopping or missusing any interrupt that we know of. The software would receive data via UART, pass it to the main task, that would check the data with CRC32, decode the data with nanopb, and encode a response that is sent via the NUS to the other node, where something similar is performed. All of this is made in main task, only callbacks that might come from an interrupt are the UART callback and NUS RX callback, but they only load data into a FIFO and return. Similarly with the BLE callbacks, except for the 2 callbacks for advertising and scanning that contain the call to the DM library.

Also, we have the problem that the assert behaves differently in our EvalBoard (Ezurio BL653-DVK based on the BL653/nRF52833) and our custom hardware around the BL653micro/nRF52833. In the EvalBoard, the system seems to hang, but the LOG output (using Segger RTT backend) freezes (no assert is seen, but the whole system is unresponsive), while in the custom hardware the Assert output is there. Both configurations have the CONFIG_ASSERT as seen above.

Central prj.conf

CONFIG_FPU=y
CONFIG_UART_ASYNC_API=y
CONFIG_UART_1_NRF_HW_ASYNC=y
CONFIG_UART_1_NRF_HW_ASYNC_TIMER=3
CONFIG_CRC=y
CONFIG_NANOPB=y
CONFIG_BT=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_CENTRAL=y
CONFIG_BT_DEVICE_NAME="central"
CONFIG_BT_EXT_ADV=y
CONFIG_BT_SCAN=y
CONFIG_BT_SCAN_FILTER_ENABLE=y
CONFIG_BT_SCAN_MANUFACTURER_DATA_CNT=1
CONFIG_BT_SCAN_UUID_CNT=1
CONFIG_BT_SCAN_WITH_IDENTITY=y
CONFIG_BT_GATT_CLIENT=y
CONFIG_BT_GATT_DM=y
CONFIG_HEAP_MEM_POOL_SIZE=1024
CONFIG_BT_CTLR_PHY_2M=y
CONFIG_BT_CTLR_PHY_CODED=y
CONFIG_BT_AUTO_PHY_UPDATE=n
CONFIG_BT_USER_PHY_UPDATE=y
CONFIG_BT_USER_DATA_LEN_UPDATE=y
CONFIG_BT_PERIPHERAL_PREF_MIN_INT=6
CONFIG_BT_PERIPHERAL_PREF_MAX_INT=6
CONFIG_BT_PERIPHERAL_PREF_LATENCY=2
CONFIG_BT_PERIPHERAL_PREF_TIMEOUT=500
CONFIG_BT_GAP_AUTO_UPDATE_CONN_PARAMS=n
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_BUF_ACL_RX_SIZE=251
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_BT_NUS_CLIENT=y
CONFIG_DM_MODULE=y
CONFIG_DM_HIGH_PRECISION_CALC=y
CONFIG_MPSL=y
CONFIG_MPSL_TIMESLOT_SESSION_COUNT=1
CONFIG_MPSL_WORK_STACK_SIZE=2048
CONFIG_NRFX_PPI=y


Central DM Adv-Scan parameters:

static struct bt_le_adv_param app_ble_adv_params_noconnectable =
    BT_LE_ADV_PARAM_INIT(BT_LE_ADV_OPT_USE_IDENTITY | BT_LE_ADV_OPT_SCANNABLE | BT_LE_ADV_OPT_NOTIFY_SCAN_REQ,
                            BT_GAP_ADV_SLOW_INT_MIN, // 160 * 0.625 ms = 100 ms
                            BT_GAP_ADV_SLOW_INT_MAX, // 240 * 0.625 ms = 150 ms
                            NULL);

static struct bt_data app_ble_advertisement_data[] = {
    BT_DATA_BYTES(BT_DATA_FLAGS, (BT_LE_AD_GENERAL | BT_LE_AD_NO_BREDR)),
    BT_DATA(BT_DATA_NAME_COMPLETE, APP_BLE_DEVICE_NAME, APP_BLE_DEVICE_NAME_LEN),
};

static struct bt_data app_ble_scan_response_dm_data[] = {
    BT_DATA(BT_DATA_MANUFACTURER_DATA, (unsigned char*)&app_ble.mfg_data, sizeof(app_ble.mfg_data)),
};

static struct bt_le_scan_param app_ble_scan_params = {
    .type = BT_LE_SCAN_TYPE_ACTIVE,
    .interval = BT_GAP_SCAN_FAST_INTERVAL,
    .window   = BT_GAP_SCAN_FAST_WINDOW,
    .options = BT_LE_SCAN_OPT_NONE,
    .timeout = 0,
};

// callback for the bt_data_parse in the scan filter match callback
static bool app_ble_scan_data_cb(struct bt_data* data, void* user_data)
{
    struct adv_mfg_data* recv_mfg_data;
    struct dm_request req;

    switch (data->type)
    {
    case BT_DATA_MANUFACTURER_DATA:
        if (sizeof(struct adv_mfg_data) == data->data_len)
        {
            recv_mfg_data = (struct adv_mfg_data*)data->data;

            bt_addr_le_copy(&req.bt_addr, user_data);
            req.role = DM_ROLE_INITIATOR;
            req.ranging_mode = DM_RANGING_MODE_MCPD;

            req.rng_seed = sys_le32_to_cpu(recv_mfg_data->rng_seed) + app_ble.scanner_random_share;
            req.start_delay_us = 0;
            req.extra_window_time_us = 0;

            dm_request_add(&req);
        }
        return false;
    default:
        return true;
    }
}

// callback for the advertising scanned callback
static void app_ble_adv_scanned_cb(struct bt_le_ext_adv* adv, struct bt_le_ext_adv_scanned_info* info)
{
    struct dm_request req;

    if (peer_supported_test(info->addr))
    {
        bt_addr_le_copy(&req.bt_addr, info->addr);
        req.role = DM_ROLE_REFLECTOR;
        req.ranging_mode = DM_RANGING_MODE_MCPD;

        req.rng_seed = peer_rng_seed_get() + app_ble_scanner_addr_to_random_share(&info->addr->a);
        req.start_delay_us = 0;
        req.extra_window_time_us = 0;

        dm_request_add(&req);
        app_ble_adv_update_data();
    }
}

// DM library data ready callback
static void app_ble_dm_data_ready_cb(struct dm_result* result)
{
    if (result->status)
    {
        peer_update(result);
    }
}



Peripheral prj.conf

CONFIG_FPU=y
CONFIG_UART_ASYNC_API=y
CONFIG_UART_1_NRF_HW_ASYNC=y
CONFIG_UART_1_NRF_HW_ASYNC_TIMER=3
CONFIG_CRC=y
CONFIG_NANOPB=y
CONFIG_BT=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_CENTRAL=y
CONFIG_BT_DEVICE_NAME="Peripheral"
CONFIG_BT_EXT_ADV=y
CONFIG_BT_SCAN=y
CONFIG_BT_SCAN_FILTER_ENABLE=y
CONFIG_BT_SCAN_MANUFACTURER_DATA_CNT=1
CONFIG_BT_SCAN_WITH_IDENTITY=y
CONFIG_BT_CTLR_PHY_2M=y
CONFIG_BT_CTLR_PHY_CODED=y
CONFIG_BT_AUTO_PHY_UPDATE=n
CONFIG_BT_USER_PHY_UPDATE=y
CONFIG_BT_USER_DATA_LEN_UPDATE=y
CONFIG_BT_PERIPHERAL_PREF_MIN_INT=6
CONFIG_BT_PERIPHERAL_PREF_MAX_INT=6
CONFIG_BT_PERIPHERAL_PREF_LATENCY=2
CONFIG_BT_PERIPHERAL_PREF_TIMEOUT=500
CONFIG_BT_GAP_AUTO_UPDATE_CONN_PARAMS=n
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_BUF_ACL_RX_SIZE=251
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_BT_NUS=y
CONFIG_DM_MODULE=y
CONFIG_DM_HIGH_PRECISION_CALC=y
CONFIG_MPSL=y
CONFIG_MPSL_TIMESLOT_SESSION_COUNT=1
CONFIG_MPSL_WORK_STACK_SIZE=2048
CONFIG_NRFX_PPI=y
CONFIG_UART_CONSOLE=n
CONFIG_CONSOLE=y
CONFIG_LOG=y
CONFIG_LOG_MODE_MINIMAL=y
CONFIG_LOG_PRINTK=y
CONFIG_USE_SEGGER_RTT=y
CONFIG_RTT_CONSOLE=y
CONFIG_ASSERT=y
CONFIG_ASSERT_VERBOSE=y
CONFIG_ASSERT_NO_COND_INFO=n
CONFIG_ASSERT_NO_MSG_INFO=n
CONFIG_RESET_ON_FATAL_ERROR=n


peripheral DM Adv-Scan parameters

static struct bt_le_adv_param app_ble_adv_params_noconnectable =
    BT_LE_ADV_PARAM_INIT(BT_LE_ADV_OPT_USE_IDENTITY | BT_LE_ADV_OPT_SCANNABLE | BT_LE_ADV_OPT_NOTIFY_SCAN_REQ,
                            BT_GAP_ADV_FAST_INT_MIN_2, // 160 * 0.625 ms = 100 ms
                            BT_GAP_ADV_FAST_INT_MAX_2, // 240 * 0.625 ms = 150 ms
                            NULL);

static struct bt_data app_ble_advertisement_data[] = {
    BT_DATA_BYTES(BT_DATA_FLAGS, (BT_LE_AD_GENERAL | BT_LE_AD_NO_BREDR)),
};

static struct bt_data app_ble_scan_response_dm_data[] = {
    BT_DATA(BT_DATA_MANUFACTURER_DATA, (unsigned char*)&app_ble.mfg_data, sizeof(app_ble.mfg_data)),
};

static struct bt_le_scan_param app_ble_scan_params = {
    .type = BT_LE_SCAN_TYPE_ACTIVE,
    .interval = BT_GAP_SCAN_FAST_INTERVAL,
    .window = BT_GAP_SCAN_FAST_WINDOW,
    .options = BT_LE_SCAN_OPT_NONE,
    .timeout = 0,
};

static bool app_ble_scan_data_cb(struct bt_data* data, void* user_data)
{
    struct adv_mfg_data* recv_mfg_data;
    struct dm_request req;

    switch (data->type)
    {
    case BT_DATA_MANUFACTURER_DATA:
#if APP_BLE_DEBUG_DM == 1
        RTT_LOG_INF("Scanner filter: MFG found, len: %d, sizeof: %d", data->data_len, sizeof(struct adv_mfg_data));
#endif
        if (sizeof(struct adv_mfg_data) == data->data_len)
        {
            recv_mfg_data = (struct adv_mfg_data*)data->data;

            bt_addr_le_copy(&req.bt_addr, user_data);
            req.role = DM_ROLE_INITIATOR;
            req.ranging_mode = DM_RANGING_MODE_MCPD;

            req.rng_seed = sys_le32_to_cpu(recv_mfg_data->rng_seed) + app_ble.scanner_random_share;
            req.start_delay_us = 0;
            req.extra_window_time_us = 0;

            dm_request_add(&req);
        }
        return false;
    default:
        return true;
    }
}

static void app_ble_adv_scanned_cb(struct bt_le_ext_adv* adv, struct bt_le_ext_adv_scanned_info* info)
{
    struct dm_request req;

    if (peer_supported_test(info->addr) == true)
    {
        bt_addr_le_copy(&req.bt_addr, info->addr);
        req.role = DM_ROLE_REFLECTOR;
        req.ranging_mode = DM_RANGING_MODE_MCPD;

        req.rng_seed = peer_rng_seed_get() + app_ble_scanner_addr_to_random_share(&info->addr->a);
        req.start_delay_us = 0;
        req.extra_window_time_us = 0;

        dm_request_add(&req);
        app_ble_adv_update_data();
    }
}



What should be checked when an MPSL assert happens? Is there any helpful guide on how does the MPSL internally works, or how is it used in the Distance Measurement Library? What might cause such assert error?

How to explain the difference in the error behavior between the EvalBoard and the custom hardware? What might cause the Logging output to be visible in one hardware but not in another?

Any idea or comment would be helpful.

Thanks in advance

Regards

Javier

Parents
  • Hi, Javier.

    Thank you for reaching out. Have you made any changes to the sample you're using?

    Additionally,

    • Does the application disable interrupts?
    • Does the application use the highest interrupt priority?
    • Is the application using a timeslot?
    • Did you perform any changes to the DT? Could you upload your zephyr.dts file?
    From other posts here we might interpret that it might be a timing problem with an interrupt, but we are not directly stopping or missusing any interrupt that we know of.

    What timeout did you use for uart_rx_enable?

    Best regards,
    Mathias

  • Hi, thanks for the fast feedback. To your questions:

    We integrated the example (nrf_dm) in our code. What I meant is that we only took the part specific for distance measurement, and not the parts that create the DDFS service, or the part for the PWM to the LED, or the simulation of the Azimuth.

    We do not disable interrupts anywhere in our code.

    Our application uses the default priority (all is done from the main function aka the main task).

    We do not use MPSL timeslots directly, but as far as we understand, the DM library uses it. We are not sure if the BLE stack itself uses MPSL and how. See the config for both Central and Peripheral in the initial post, that contains the configs:

    CONFIG_MPSL=y
    CONFIG_MPSL_TIMESLOT_SESSION_COUNT=1
    CONFIG_MPSL_WORK_STACK_SIZE=2048


    We have a DT overlay that defines the timer for the DM library (timer2, as in the nrf_dm example) and pins for UART1 with HW control flow.

    / {
         chosen {
            ncs,dm-timer = &timer2;
        };
    };
    
    &uart1 {
        compatible = "nordic,nrf-uarte";
        status = "okay";
        pinctrl-0 = <&uart1_default>;
        pinctrl-1 = <&uart1_sleep>;
        pinctrl-names = "default", "sleep";
        current-speed = <115200>;
        hw-flow-control;
    };
    
    &pinctrl {
        uart1_default: uart1_default {
            group1 {
                psels = <NRF_PSEL(UART_TX, 0, 28)>, <NRF_PSEL(UART_RX, 0, 29)>, <NRF_PSEL(UART_CTS, 1, 06)>, <NRF_PSEL(UART_RTS, 1, 04)>;
            };
        };
    
        uart1_sleep: uart1_sleep {
            group1 {
                psels = <NRF_PSEL(UART_TX, 0, 28)>, <NRF_PSEL(UART_RX, 0, 29)>, <NRF_PSEL(UART_CTS, 1, 06)>, <NRF_PSEL(UART_RTS, 1, 04)>;
                low-power-enable;
            };
        };
    };


    The timeout for the UART (Async API) is currently 500 microseconds. It should be noted, that the callback from UART does no blocking at all, and does not interact with BLE or DM functions at all.

    static void uart1_event_cb(const struct device *dev, struct uart_event *evt, void *user_data)
    {
        if (dev == NULL || evt == NULL)
        {
            return;
        }
    
        switch (evt->type)
        {
        case UART_TX_DONE:
            if (app_uart.tx_status == UART_TX_STATUS_SENDING)
            {
                /* The State Machine is waiting for this event to go back to idle */
                app_uart.tx_status = UART_TX_STATUS_SENT;
            }
            else
            {
                RTT_LOG_WRN("Unexpected UART TX status: %d", app_uart.tx_status);
            }
            break;
        case UART_TX_ABORTED:
            if (app_uart.tx_status == UART_TX_STATUS_SENDING)
            {
                /* The State Machine can also wait for this event to go back to idle */
                app_uart.tx_status = UART_TX_STATUS_ERROR;
                RTT_LOG_ERR("UART TX aborted");
            }
            break;
        case UART_RX_RDY:
            if ((evt->data.rx.buf != NULL) &&
                (evt->data.rx.len > APP_UART_RX_MIN_SIZE) &&
                (evt->data.rx.len <= APP_UART_BUFFER_SIZE))
            {
                app_uart.rx.buf_len = evt->data.rx.len;
                for (size_t i = 0; i < APP_UART_QUEUE_NUM_ITEMS; ++i)
                {
                    if (app_uart.fifo_storage.items[i].len == 0)
                    {
                        app_uart.fifo_storage.items[i].len = app_uart.rx.buf_len;
                        memcpy(app_uart.fifo_storage.items[i].data, &evt->data.rx.buf[evt->data.rx.offset], app_uart.rx.buf_len);
                        /* To restart the DMA, UART needs to be disabled first. It will be shortly re enabled in the UART_RX_DISABLED event */
                        int err;
                        if ((err = uart_rx_disable(uart1_dev)) < 0)
                        {
                            RTT_LOG_ERR("IRQ uart_rx_disable %d", err);
                        }
                        /* Address of the FIFO slot is sent to the State Machine */
                        k_fifo_put(&app_uart.fifo, &app_uart.fifo_storage.items[i]);
                        break;
                    }
                    else if (i == APP_UART_QUEUE_NUM_ITEMS - 1)
                    {
                        RTT_LOG_WRN("No FIFO items for UART RX");
                    }
                }
            }
            else
            {
                RTT_LOG_WRN("Wrong UART Event params: %p, %u", (void*)evt->data.rx.buf, evt->data.rx.len);
            }
            break;
        case UART_RX_BUF_REQUEST:
            break;
        case UART_RX_BUF_RELEASED:
            break;
        case UART_RX_DISABLED:
            /* UART RX was disabled in the UART_RX_RDY event to restart the DMA, hier RX is started again */
            int err;
            if ((err = uart_rx_enable(uart1_dev, app_uart.rx.buf, sizeof(app_uart.rx.buf), APP_UART_IDLE_TIMEOUT_US)) < 0)
            {
                RTT_LOG_ERR("IRQ uart_rx_enable %d", err);
            }
            break;
        case UART_RX_STOPPED:
            RTT_LOG_ERR("UART_RX_STOPPED");
            break;
        default:
            break;
        }
    }


    Regards,

    Javier

  • Sorry for the lack of response here, Javier.

    I've forwarded the information you provided internally and am still waiting for an update.

    Best regards,
    Mathias

  • I've received some answers now, Javier.

    Javier Reyes said:
    We do not use MPSL timeslots directly, but as far as we understand, the DM library uses it. We are not sure if the BLE stack itself uses MPSL and how.

    Yes, the BLE stack uses MPSL internally.

    • You can think of it as we use it in a similar way to timeslots. Some idea on how timeslots work together with Bluetooth roles can be found in the table here.
    • The problem is most likely that you use a too short duration for the timeslot. However, we're not able to reproduce the issue with the sample as it is in nrf 2.9.
      • A workaround would be to add some extra_window_time_us to the dm_request.
        It would be interesting to know why it is too short. For this, we would need to know:
        • Which role (reflector/initiator) is triggering the assert?
        • Are there any modifications in dm.c?
        • How often does the assert happen?
        • What is the interrupt priority used for UART?
        • By how much do you need to increase the length of the timeslot (through extra_window_time_us) to make the assert go away?
    • About the logging being different:
      It could be helpful to compare the generated configurations (the one in build/<app_name>/zephyr/.config) for the two boards.
      There are different configurations that may affect printing of the assert https://github.com/nrfconnect/sdk-nrf/blob/main/subsys/mpsl/init/mpsl_init.c#L302
  • Hi Mathiaso, thank you for your reponse.

    In order to help to reproduce, we have adjusted the nRF_dm example code, so that it starts the Distance Measurement only after a BLE connection, added the logic for a central to be able to scan (with a filter for NUS UUID) the peripheral, and connect to it, as well as discovering the NUS service, enabling the RX characteristic (all also based on the BLE_Central_UART and BLE_Peripheral_UART examples). Currently no data is being transmited via NUS Service, as with only this changes we were able to reproduce the Assert error. I am attaching here the ZIP project. It is based on the Ezurio BL653 EvalBoard, but it should be also repeatable on a nRF52833 DK.

    Regardles your other topics:

    - We are not sure how to find which role is triggering the assert. As far as we have been able to find, the error happens during hanlding an interrupt, and debugging has not shown a valid Call stack in the Vscode view (only event handlers without name or address)

    - We have no modifications at all in the SDK code or any of its components, only to user application code.

    - It can happen as fast as a few seconds after the BLE connection / activation of DM. Sometimes it takes longer, but when the board gets separated a few centimeters (100cm - 200cm), the assert happens.

    - In our code, we are not configuring UART interrupt priority, so I guess it is default. We only use the ASYNC API config and the HW counting bytes config. In this MWE, we did not even used UART at all, so my guess is that UART has nothing to do with it.

    - We have not yet tried changing the extra_window_time_us, as the example was always working with them values as zero.

    - We also tracked the return value of the function call dm_request_add, and we see a lot of EBUSY errors, even in the vanilla nRF_dm example code. Is this normal?

    Kind regards

    Javier
    modified_demo_nrf_dm.zip

Reply Children
No Data
Related