MPSL assert during MCUmgr image erase

In my project I use MPSL for a combination of BLE, ESB and flash sync.

The BLE/ESB timeslot handling was implemented following this example: https://github.com/too1/ncs-esb-ble-mpsl-demo

During normal operation it is running perfectly fine (so far).

However, when using MCUmgr (MCUmgr web in my case) and executing the "erase" command for an image in the slot, i get an MPSL assert:

rtt:~$ rtt:~$ [00:26:22.662,780] <err> mpsl_init: MPSL ASSERT: 106, 477
rtt:~$ [00:26:22.662,811] <err> os: ***** HARD FAULT *****
rtt:~$ [00:26:22.662,841] <err> os:   Fault escalation (see below)
rtt:~$ [00:26:22.662,872] <err> os: ARCH_EXCEPT with reason 3
 
rtt:~$ [00:26:22.662,902] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00015393
rtt:~$ [00:26:22.662,933] <err> os: r3/a4:  0x00000000 r12/ip:  0x200074c0 r14/lr:  0xffffffff
rtt:~$ [00:26:22.662,963] <err> os:  xpsr:  0x61000000
rtt:~$ [00:26:22.663,024] <err> os: s[ 0]:  0x00000000  s[ 1]:  0x00037f59  s[ 2]:  0x00000000  s[ 3]:  0x20002398
rtt:~$ [00:26:22.663,085] <err> os: s[ 4]:  0x2000239a  s[ 5]:  0x00020a9d  s[ 6]:  0x00000000  s[ 7]:  0x00000000
rtt:~$ [00:26:22.663,146] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x00037f4f  s[10]:  0x00000000  s[11]:  0x00000000
rtt:~$ [00:26:22.663,208] <err> os: s[12]:  0x00061ff8  s[13]:  0x2001587c  s[14]:  0x00000000  s[15]:  0x2001587c
rtt:~$ [00:26:22.663,238] <err> os: fpscr:  0x00000000
rtt:~$ [00:26:22.663,269] <err> os: Faulting instruction address (r15/pc): 0x00037f86
rtt:~$ [00:26:22.663,299] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
rtt:~$ [00:26:22.663,330] <err> os: Current thread: 0x20008460 (MPSL Work)
rtt:~$ [00:26:23.155,944] <err> os: Halting system

Of course the filename 106 and line number 477 don't really tell me much about what triggered it, and since the MPSL lib is closed source there's no way to find out.

I hope you can provide me with the details of where this assert was triggered and perhaps some pointers to what i could look at, to prevent this from happening

  • Hi,

    Which SDK version were you using when you got this assert (the line numbers for the same assert may differ between versions)?

    That said, this assert is likely in the handling of the MPSL_TIMESLOT_SIGNAL_CANCELLED event. This statement from the MPSL API documentation is likely relevant: "For signals executed in low priority returning any action other than MPSL_TIMESLOT_SIGNAL_ACTION_NONE will result in an assert." The proper way of requesting an event when processing a low priority signal is to invoke mpsl_timeslot_request().

  • Thanks for your reply!

    This was using NCS version 2.6.0, but i got the same using 2.5.1.

    Here are the relevant parts of my timeslot handler, where I do call mpsl_timeslot_request() upon receiving the MPSL_TIMESLOT_SIGNAL_CANCELLED event;

    extern "C" void RADIO_IRQHandler(void);
    
    mpsl_timeslot_request_t esb_timeslot_request{
        .request_type = MPSL_TIMESLOT_REQ_TYPE_EARLIEST,
        .params{
            .earliest{
                .hfclk = MPSL_TIMESLOT_HFCLK_CFG_NO_GUARANTEE,
                .priority = MPSL_TIMESLOT_PRIORITY_NORMAL,
                .length_us = kTimeslotLengthUs,
                .timeout_us = kTimeslotReqTimeoutUs,
            },
        },
    };
    
    mpsl_timeslot_signal_return_param_t* MyClass::TimeslotEventHandler(uint8_t session_id, uint32_t signal) {
      static bool timeslot_extenstion_failed = false;
    
      auto& inst = MyClass::GetInstance();
      inst.timeslot_action_.callback_action = MPSL_TIMESLOT_SIGNAL_ACTION_NONE;
    
      switch (signal) {
        case MPSL_TIMESLOT_SIGNAL_START: {
          timeslot_extenstion_failed = false;
    
          NVIC_ClearPendingIRQ(RADIO_IRQn);
          NRF_RADIO->POWER = RADIO_POWER_POWER_Disabled << RADIO_POWER_POWER_Pos;
          NRF_RADIO->POWER = RADIO_POWER_POWER_Enabled << RADIO_POWER_POWER_Pos;
          NVIC_ClearPendingIRQ(RADIO_IRQn);
    
          nrf_timer_bit_width_set(NRF_TIMER0, NRF_TIMER_BIT_WIDTH_32);
          nrf_timer_cc_set(NRF_TIMER0, NRF_TIMER_CC_CHANNEL0, kTimerExpiryUsEarly);
          nrf_timer_cc_set(NRF_TIMER0, NRF_TIMER_CC_CHANNEL1, kTimerExpiryUsReq);
    
          nrf_timer_int_enable(NRF_TIMER0, NRF_TIMER_INT_COMPARE0_MASK);
          nrf_timer_int_enable(NRF_TIMER0, NRF_TIMER_INT_COMPARE1_MASK);
    
          inst.SetTimeslotActive(true);
          break;
        }
        case MPSL_TIMESLOT_SIGNAL_TIMER0: {
          if (nrf_timer_event_check(NRF_TIMER0, NRF_TIMER_EVENT_COMPARE0)) {
            nrf_timer_int_disable(NRF_TIMER0, NRF_TIMER_INT_COMPARE0_MASK);
            nrf_timer_event_clear(NRF_TIMER0, NRF_TIMER_EVENT_COMPARE0);
    
            inst.timeslot_action_.callback_action = MPSL_TIMESLOT_SIGNAL_ACTION_EXTEND;
            inst.timeslot_action_.params.extend.length_us = kTimeslotLengthUs;
          } else if (nrf_timer_event_check(NRF_TIMER0, NRF_TIMER_EVENT_COMPARE1)) {
            nrf_timer_int_disable(NRF_TIMER0, NRF_TIMER_INT_COMPARE1_MASK);
            nrf_timer_event_clear(NRF_TIMER0, NRF_TIMER_EVENT_COMPARE1);
    
            if (timeslot_extenstion_failed) {
              inst.timeslot_action_.callback_action = MPSL_TIMESLOT_SIGNAL_ACTION_REQUEST;
              inst.timeslot_action_.params.request.p_next = &esb_timeslot_request;
            }
          }
          break;
        }
        case MPSL_TIMESLOT_SIGNAL_EXTEND_SUCCEEDED: {
          // next trigger time is current + timer expiry early
          uint32_t current_cc = nrf_timer_cc_get(NRF_TIMER0, NRF_TIMER_CC_CHANNEL0);
          nrf_timer_bit_width_set(NRF_TIMER0, NRF_TIMER_BIT_WIDTH_32);
          nrf_timer_cc_set(NRF_TIMER0, NRF_TIMER_CC_CHANNEL0, (current_cc + kTimeslotLengthUs));
          nrf_timer_int_enable(NRF_TIMER0, NRF_TIMER_INT_COMPARE0_MASK);
    
          current_cc = nrf_timer_cc_get(NRF_TIMER0, NRF_TIMER_CC_CHANNEL1);
          nrf_timer_bit_width_set(NRF_TIMER0, NRF_TIMER_BIT_WIDTH_32);
          nrf_timer_cc_set(NRF_TIMER0, NRF_TIMER_CC_CHANNEL1, (current_cc + kTimeslotLengthUs));
          nrf_timer_int_enable(NRF_TIMER0, NRF_TIMER_INT_COMPARE1_MASK);
          break;
        }
        case MPSL_TIMESLOT_SIGNAL_EXTEND_FAILED: {
          timeslot_extenstion_failed = true;
          inst.SetTimeslotActive(false);
          break;
        }
        case MPSL_TIMESLOT_SIGNAL_RADIO: {
          // radio signal received, manually call RADIO IRQ / disable IRQ
          if (inst.timeslot_active_) {
            RADIO_IRQHandler();
          } else {
            NVIC_ClearPendingIRQ(RADIO_IRQn);
            NVIC_DisableIRQ(RADIO_IRQn);
          }
          break;
        }
        case MPSL_TIMESLOT_SIGNAL_CANCELLED:  // [[fallthrough]];
        case MPSL_TIMESLOT_SIGNAL_BLOCKED: {
          inst.SetTimeslotActive(false);
          mpsl_timeslot_request(session_id, &esb_timeslot_request);
          break;
        }
        case MPSL_TIMESLOT_SIGNAL_OVERSTAYED:  // [[fallthrough]];
        case MPSL_TIMESLOT_SIGNAL_INVALID_RETURN: {
          inst.timeslot_action_.callback_action = MPSL_TIMESLOT_SIGNAL_ACTION_END;
          inst.SetTimeslotActive(false);
          break;
        }
        case MPSL_TIMESLOT_SIGNAL_SESSION_IDLE:  // [[fallthrough]];
        case MPSL_TIMESLOT_SIGNAL_SESSION_CLOSED: {
          inst.SetTimeslotActive(false);
          break;
        }
        default: {
          return nullptr;
        }
      }
    
      return &inst.timeslot_action_;
    }

    With relevant class definition (just parts of it for this demonstration) in the header (note: using c++17)

    class MyClass {
     public:
      MyClass(const MyClass&) = delete;
      MyClass& operator=(const MyClass&) = delete;
      MyClass(MyClass&&) = delete;
      MyClass& operator=(MyClass&&) = delete;
    
      static inline MyClass& GetInstance() {
        static MyClass instance;
        return instance;
      }
    
     private:
      MyClass() = default;
      ~MyClass() = default;
    
      void SetTimeslotActive(bool active);
    
      static mpsl_timeslot_signal_return_param_t* TimeslotEventHandler(uint8_t session_id, uint32_t signal);
    
      bool timeslot_active_{false};
      mpsl_timeslot_signal_return_param_t timeslot_action_{};
    };


    The biggest difference with the example here: https://github.com/too1/ncs-esb-ble-mpsl-demo/blob/master/common/timeslot_handler.c#L183 Is that the request is done from a non-preemptible thead, while I just called mpsl_timeslot_request() directly from the timeslot handler. Could this be the reason for the assert as well?

    I wasn't really sure whether this was necessary, since i would have to create a new thread and message queue for it, which takes up extra stack space. In case it really is required, could i also use the system workqueue to schedule the request? It has a cooperative priority

  • I see. Then I can confirm that the assert is because you of an action other than NONE in the handling of a MPSL_TIMESLOT_SIGNAL_CANCELLED event. Defering the call to mpsl_timeslot_request() with the help of the system workqueue as you suggest should resolve this.

Related