NCS v3.2.4: mpsl_init: MPSL ASSERT: 109, 615

Hey Team!
We are developing a radio-heavy BLE application on the nRF SDK version v.3.2.4 where we use the experimential QoS Channel Survey feature. In normal operation, it is working fine, but after some time, we are encountering a MPSL fault always on the BLE central, where we have the QoS Channel Survey active.

00> [ 1948.118408] <err> mpsl_init: MPSL ASSERT: 109, 615

00> [ 1948.118408] <err> os:   Fault escalation (see below)
00> [ 1948.118408] <err> os: ARCH_EXCEPT with reason 3
00> 
00> [ 1948.118469] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
00> [ 1948.118469] <err> os: Fault during interrupt handling
00> 
00> [ 1948.118499] <err> os: Current thread: 0x2000ce30 (unknown)


We don't have any custom MPSL slots active (CONFIG_MPSL_TIMESLOT_SESSION_COUNT=0) and as it is only happening on the BLE central where the QoS Channel Survey feature is active, we are thinking if this has something to do with it.

The application is running on a custom board with a nRF52840 which only has a RC oscillator as our LF clock source. CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y and CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC_CALIBRATION=y are set.


For now, as the assert only happens after an undefined time (which can be hours), we only managed to reproduce it using a production build and with channel survey active. We are currently trying to get more information by also trying to reproduce it with a debug build. Also, afterwards, we will then disable the channel survey feature to see if the assert also happens without it. For now, it would really help us to understand what the reason of the MPSL assert could be.

I will add more information as soon as we have it.

Parents
  • Hello,

    I have asked internally now, will let you know as soon as I learn more about the assert in question.

    Update: Please update to v3.3.0, where channel survey is no longer experimental. 

    Kenneth

  • Something else we just discovered, is that we get the error always after the same time.
    Reducing the clock calibration period (CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD) triggers this error faster.

    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD = 4000 (default) -> Assertion after approx. 25mins
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD = 1000 -> Assertion after approx. 7mins
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD = 500 -> Assertion after approx. 3.5mins
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD=500 and CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_MAX_SKIP=0 -> Assertion after approx. 2mins
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD=250 and CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_MAX_SKIP=0 -> Assertion after approx. 1min
Reply
  • Something else we just discovered, is that we get the error always after the same time.
    Reducing the clock calibration period (CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD) triggers this error faster.

    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD = 4000 (default) -> Assertion after approx. 25mins
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD = 1000 -> Assertion after approx. 7mins
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD = 500 -> Assertion after approx. 3.5mins
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD=500 and CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_MAX_SKIP=0 -> Assertion after approx. 2mins
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD=250 and CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_MAX_SKIP=0 -> Assertion after approx. 1min
Children
No Data
Related