NCS Update to v2.5.2: MPSL Hard Fault on Advertisement

Question

Hello,

I'm not a big fan of updating the NRF Connect SDK, because experience has shown that every time so many things have been changed that I am always busy for at least a whole day until my application is compatible again. Accordingly, we always wait some time before we carry out such an update so that as many bugs as possible have been removed.

This time, however, there seem to have been major changes in the SDK that destroyed some of the core functionalitites.

We use the nrf52840.
We have now updated the NCS v2.4.0 to v2.5.2.
For building we use the official Docker image nordicplayground/nrfconnect-sdk:v2.5-branch

With our application, we ALWAYS receive the following hard fault after approx. 34 minutes:

[00:33:56.096,374] <err> mpsl_init: MPSL ASSERT: 109, 300
[00:33:56.096,374] <err> os: ***** HARD FAULT *****
[00:33:56.096,405] <err> os:   Fault escalation (see below)
[00:33:56.096,405] <err> os: ARCH_EXCEPT with reason 3
[00:33:56.096,435] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000060  r2/a3:  0x0000005f
[00:33:56.096,466] <err> os: r3/a4:  0x200043e0 r12/ip:  0x00000000 r14/lr:  0x00053c4b
[00:33:56.096,466] <err> os:  xpsr:  0x41000011
[00:33:56.096,496] <err> os: s[ 0]:  0x0000000a  s[ 1]:  0x0007675d  s[ 2]:  0x00000001  s[ 3]:  0x0003c06d
[00:33:56.096,496] <err> os: s[ 4]:  0x00393031  s[ 5]:  0x00000020  s[ 6]:  0x00000000  s[ 7]:  0x0003b25d
[00:33:56.096,527] <err> os: s[ 8]:  0x00000020  s[ 9]:  0x0003951b  s[10]:  0x20003000  s[11]:  0x00000000
[00:33:56.096,557] <err> os: s[12]:  0x03fa43d2  s[13]:  0x00000000  s[14]:  0x00000001  s[15]:  0x00000000
[00:33:56.096,557] <err> os: fpscr:  0x20003003
[00:33:56.096,588] <err> os: Faulting instruction address (r15/pc): 0x00076798
[00:33:56.096,618] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:33:56.096,618] <err> os: Fault during interrupt handling
[00:33:56.096,649] <err> os: Current thread: 0x200043e0 (unknown)
[00:33:56.985,443] <err> os: Halting system

It can be seen that this has to do with an interrupt from the MPSL library. The address itself only resolves to:
/workdir/nrf/subsys/mpsl/init/mpsl_init.c:191
which is the handler itself.

I was able to locate the error so far that it has to do with the BLE advertisement.

Namely in the following function:

void update_advertisement_data(struct bt_le_ext_adv* adv, IAdvertisementManager* adv_manager)
{
    if(adv_manager->is_running())
    {
        auto my_data = adv_manager->get_next();
        if(my_data != nullptr)
        {
            auto err = bt_le_ext_adv_set_data(adv, my_data->ptr, my_data->len, NULL, 0);
            if(err)
            {
                printk("Failed to set advertising data (err %d)\n", err);
            }
        }

        if(atomic_test_bit(adv->flags, BT_ADV_ENABLED))
        {
            return;
        }
        bt_le_ext_adv_start(adv, &ext_adv_start_param);
    }
}

With the following advertisement configuration:

static struct bt_le_adv_param* connectable_adv_param =
    BT_LE_ADV_PARAM(BT_LE_ADV_OPT_CONNECTABLE | BT_LE_ADV_OPT_USE_NAME,
                    BT_GAP_ADV_FAST_INT_MIN_2,
                    BT_GAP_ADV_FAST_INT_MAX_2,
                    NULL);
static struct bt_le_ext_adv_start_param ext_adv_start_param = BT_LE_EXT_ADV_START_PARAM_INIT((0), (3));

Explanation

The advertisement data changes constantly during runtime. For this purpose, the above function is called every second in a main loop, depending on whether the last data packet was sent or not.

As I understand it, it is not enough to simply set the new data, because this is then not automatically advertised.

The BT_ADV_ENABLED check is redundant here, bt_le_ext_adv_start also does the same - this is known.

Notable

If I omit the following code, my application runs normally, but won't advertise anymore.

if(atomic_test_bit(adv->flags, BT_ADV_ENABLED))
{
	return;
}
bt_le_ext_adv_start(adv, &ext_adv_start_param);

As I said, the application always crashes after approx. 34 minutes. That is around 2040 seconds, i.e. almost 2048 = 2^11. This cannot be a coincidence.

And again, up to NCS v2.4 everything works without any problems.

We have observed this behavior on all our boards so far.

NCS Update to v2.5.2: MPSL Hard Fault on Advertisement

Explanation

Notable

Top Replies