Multiprotocol application - MPSL ASSERT: 106, 684 and MPSL ASSERT: 112, 2094

Hello,

Version - nrf Connect SDK v2.7.0

The background to the system I am developing is described here

Multiprotocol Service Layer (MPSL) - BLE coexistence with proprietary communication stack
and here

 Non-volatile storage with BLE and custom wireless stack using MPSL 

In short, I have an application containing a BLE Peripheral, NVS and our custom 6TiSCH wireless stack. All three components are working simultaneously with access to hardware managed by the MPSL. Application works fine in our test environment, except for the occasional critical error that I want to discuss with you.

The first two are MPSL ASSERTS I have no clue how to interpret.:

(1) Assert from MPSL, stack pointer on idle thread, Link Register points to assert function, Program Counter in C:/ncs/v2.7.0/zephyr/lib/os/printk.c:209

ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/mpsl/init/mpsl_init.c:301
MPSL ASSERT: 112, 2094

[18:15:50.145,141] <err> os: ***** HARD FAULT *****
[18:15:50.145,172] <err> os: Fault escalation (see below)
[18:15:50.145,172] <err> os: ARCH_EXCEPT with reason 4
.....
[14:24:44.787,170] <err> os: xpsr: 0x01000018

(2) Assert from MPSL, the stack pointer in the thread that manages the NVM storage of the firmware image, Link Register points to assert function, Program Counter in C:/ncs/v2.7.0/zephyr/lib/os/printk.c:209

ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/mpsl/init/mpsl_init.c:301
MPSL ASSERT: 106, 684

[14:24:44.787,017] <err> os: ***** HARD FAULT *****
[14:24:44.787,048] <err> os: Fault escalation (see below)
[14:24:44.787,048] <err> os: ARCH_EXCEPT with reason 4
....
[14:24:44.787,170] <err> os: xpsr: 0x01000000

The third is connected with our software directly. It can be caught in the same place and under the same circumstances.

(3) This one has Ling Register set to our function iterating through a buffer within a Critical Section. This is not too excessive work for MCU.

[18:17:40.708,801] <err> os: ***** USAGE FAULT *****
[18:17:40.708,831] <err> os: Illegal use of the EPSR
...
[18:17:40.708,923] <err> os: xpsr: 0x60000200

Q1: How to interpret the number stated in MPSL assertions: MPSL ASSERT: 112, 2094 and MPSL ASSERT: 106, 684.
Q2: How long can I keep the MCU in the critical section with interrupts disabled?

Parents
  • I narrowed down the incidence of MPSL errors:

    ....
    [08:17:36.897,888] <inf> dfu: Erasing page 47
    ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/mpsl/init/mpsl_init.c:301
        MPSL ASSERT: 106, 684
    [08:17:36.898,803] <err> os: ***** HARD FAULT *****
    [08:17:36.898,834] <err> os:   Fault escalation (see below)
    [08:17:36.898,864] <err> os: ARCH_EXCEPT with reason 4
    ....

    Now only the  MPSL ASSERT: 106, 684 assert occurs. It occurs always during (or right before) the page erasure. The page erasure function I use is shown below.

    /**
     *  @brief  Erase part or all of a flash memory
     *
     *  Acceptable values of erase size and offset are subject to
     *  hardware-specific multiples of page size and offset. Please check
     *  the API implemented by the underlying sub driver, for example by
     *  using flash_get_page_info_by_offs() if that is supported by your
     *  flash driver.
     *
     *  Any necessary erase protection management is performed by the driver
     *  erase implementation itself.
     *
     *  @param  dev             : flash device
     *  @param  offset          : erase area starting offset
     *  @param  size            : size of area to be erased
     *
     *  @return  0 on success, negative errno code on fail.
     *
     *  @see flash_get_page_info_by_offs()
     *  @see flash_get_page_info_by_idx()
     */
    __syscall int flash_erase(const struct device *dev, off_t offset, size_t size);
Reply
  • I narrowed down the incidence of MPSL errors:

    ....
    [08:17:36.897,888] <inf> dfu: Erasing page 47
    ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/mpsl/init/mpsl_init.c:301
        MPSL ASSERT: 106, 684
    [08:17:36.898,803] <err> os: ***** HARD FAULT *****
    [08:17:36.898,834] <err> os:   Fault escalation (see below)
    [08:17:36.898,864] <err> os: ARCH_EXCEPT with reason 4
    ....

    Now only the  MPSL ASSERT: 106, 684 assert occurs. It occurs always during (or right before) the page erasure. The page erasure function I use is shown below.

    /**
     *  @brief  Erase part or all of a flash memory
     *
     *  Acceptable values of erase size and offset are subject to
     *  hardware-specific multiples of page size and offset. Please check
     *  the API implemented by the underlying sub driver, for example by
     *  using flash_get_page_info_by_offs() if that is supported by your
     *  flash driver.
     *
     *  Any necessary erase protection management is performed by the driver
     *  erase implementation itself.
     *
     *  @param  dev             : flash device
     *  @param  offset          : erase area starting offset
     *  @param  size            : size of area to be erased
     *
     *  @return  0 on success, negative errno code on fail.
     *
     *  @see flash_get_page_info_by_offs()
     *  @see flash_get_page_info_by_idx()
     */
    __syscall int flash_erase(const struct device *dev, off_t offset, size_t size);
Children
  • I will try to ask internally, the flash erase operation is the operation that will block the CPU execution the longest, potentially up to 85ms, ref:
    https://docs.nordicsemi.com/bundle/ps_nrf52840/page/nvmc.html#ariaid-title23 

    I assume CONFIG_SOC_FLASH_NRF_RADIO_SYNC_MPSL=y and CONFIG_SOC_FLASH_NRF_RADIO_SYNC_MPSL_TIMESLOT_SESSION_COUNT=1 is set?

    Best regards,
    Kenneth

  • I've checked the autoconf.h file and both are already implicitly set:

    #define CONFIG_SOC_FLASH_NRF_RADIO_SYNC_MPSL 1
    #define CONFIG_SOC_FLASH_NRF_RADIO_SYNC_MPSL_TIMESLOT_SESSION_COUNT 1
  • I have probably already fixed this problem.

    While the flash_erase function was processing something, it was also possible that our RTC2 interrupt was processed with possible short critical sections. When the DFU procedure starts, sequentially, 118 pages of FLASH is erased. An additional random delay is applied between each page deletion. This prevents the MCU from freezing for a long continuous period of time. During this time, our stack and BLE peripheral are active, so the processor is under heavy load and deleting flash memory at the same time.

    So disabling the interrupt before entering the flash_erase function and enabling it after, fixed the problem.

    However, only this particular flash_erase is guarded. We also use SETTINGS module in our application, which in certain circumstances can also call similar or the same back-end function. In this moment the problem was postponed. This is fine for the time being, as the SETTINGS module's data storing is only used in special circumstances.

Related