ASSERTION FAIL @ WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv.c

Hello.

I am developing using nRF5340DK and nRF Connect SDK v2.9.0.
A fatal error occurred on the board when sending some Mesh Model Messages to the smartphone app.
It doesn't happen every time. I think it is more likely to occur when sending repeatedly.

ASSERTION FAIL @ WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv.c


Here is the code where a fatal error occurs.
void bt_mesh_adv_unref(struct bt_mesh_adv *adv)
{
	__ASSERT_NO_MSG(adv->__ref > 0);

	if (--adv->__ref > 0) {
		return;
	}

"adv->__ref" was underflowing. I thought the number of calls to "bt_mesh_adv_ref()" and "bt_mesh_adv_unref()" had an effect.
However, since the detailed sequence is unclear, I added debug logs to "adv.c". The at symbol represents the location I modified.
I output the address of the "struct bt_mesh_adv", ref, and the caller's file name, function name, and line number.

static struct bt_mesh_adv *adv_create_from_pool(struct k_mem_slab *buf_pool,
						enum bt_mesh_adv_type type,
						enum bt_mesh_adv_tag tag,
						uint8_t xmit, k_timeout_t timeout)
{
    // some code...
	LOG_INF( "%p create", (void*)adv ) ;																				// @@@
	return adv;
}

struct bt_mesh_adv *bt_mesh_adv_ref(struct bt_mesh_adv *adv, const char *file, const char *func, int line)
{
	LOG_INF( "%p ref  : %u -> %u ( %s %s %u )", (void*)adv, ( adv->__ref ), ( adv->__ref + 1 ), file, func, line ) ;	// @@@
	__ASSERT_NO_MSG(adv->__ref < UINT8_MAX);

	adv->__ref++;

	return adv;
}

void bt_mesh_adv_unref(struct bt_mesh_adv *adv, const char *file, const char *func, int line)
{
	LOG_INF( "%p unref: %u -> %u ( %s %s %u )", (void*)adv, ( adv->__ref ), ( adv->__ref - 1 ), file, func, line ) ;	// @@@
	__ASSERT_NO_MSG(adv->__ref > 0);

	if (--adv->__ref > 0) {
		return;
	}
	LOG_INF( "%p free", (void*)adv ) ;																					// @@@
    // some code...
}


Result(The message content is masked.):
[00:02:14.832,489] <dbg> bt_mesh_access: bt_mesh_model_recv: app_idx 0x0000 src 0x0001 dst 0x016d
[00:02:14.832,550] <dbg> bt_mesh_access: bt_mesh_model_recv: len 2: ****
[00:02:14.832,580] <dbg> bt_mesh_access: bt_mesh_model_recv: OpCode 0x********
[00:02:14.836,273] <dbg> bt_mesh_access: bt_mesh_model_recv: app_idx 0x0000 src 0x0001 dst 0x0170
[00:02:14.836,303] <dbg> bt_mesh_access: bt_mesh_model_recv: len 5: **********
[00:02:14.836,334] <dbg> bt_mesh_access: bt_mesh_model_recv: OpCode 0x********
[00:02:14.840,057] <dbg> bt_mesh_access: bt_mesh_model_recv: app_idx 0x0000 src 0x0001 dst 0x016b
[00:02:14.840,087] <dbg> bt_mesh_access: bt_mesh_model_recv: len 2: ****
[00:02:14.840,087] <dbg> bt_mesh_access: bt_mesh_model_recv: OpCode 0x********
[00:02:14.843,811] <dbg> bt_mesh_access: bt_mesh_model_recv: app_idx 0x0000 src 0x0001 dst 0x0172
[00:02:14.843,841] <dbg> bt_mesh_access: bt_mesh_model_recv: len 5: **********
[00:02:14.860,504] <dbg> bt_mesh_access: bt_mesh_access_send: net_idx 0x0000 app_idx 0x0000 dst 0x0001
[00:02:14.860,534] <dbg> bt_mesh_access: bt_mesh_access_send: len 6: ************
[00:02:14.861,267] <inf> bt_mesh_adv: 0x2002708c create
[00:02:14.862,182] <inf> bt_mesh_adv: 0x2002708c ref  : 1 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c bt_mesh_proxy_relay_send 251 )
[00:02:14.862,243] <inf> bt_mesh_adv: 0x2002708c ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv.c bt_mesh_adv_send 302 )
[00:02:14.862,304] <inf> bt_mesh_adv: 0x2002708c unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/net.c bt_mesh_net_send 596 )
[00:02:14.862,396] <inf> bt_mesh_adv: 0x2002708c ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_relay_send 228 )
[00:02:14.862,548] <inf> bt_mesh_adv: 0x2002708c unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_msg_send_pending 274 )
[00:02:14.862,915] <inf> bt_mesh_adv: 0x2002708c unref: 2 -> 1 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c buf_send_end 211 )
[00:02:14.865,081] <dbg> bt_mesh_access: bt_mesh_model_recv: app_idx 0x0000 src 0x0001 dst 0x0171
[00:02:14.865,112] <dbg> bt_mesh_access: bt_mesh_model_recv: len 5: **********
[00:02:14.865,142] <dbg> bt_mesh_access: bt_mesh_model_recv: OpCode 0x********
[00:02:14.868,835] <dbg> bt_mesh_access: bt_mesh_access_send: net_idx 0x0000 app_idx 0x0000 dst 0x0001
[00:02:14.868,865] <dbg> bt_mesh_access: bt_mesh_access_send: len 7: **************
[00:02:14.869,598] <inf> bt_mesh_adv: 0x20027050 create
[00:02:14.870,513] <inf> bt_mesh_adv: 0x20027050 ref  : 1 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c bt_mesh_proxy_relay_send 251 )
[00:02:14.870,574] <inf> bt_mesh_adv: 0x20027050 ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv.c bt_mesh_adv_send 302 )
[00:02:14.870,635] <inf> bt_mesh_adv: 0x20027050 unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/net.c bt_mesh_net_send 596 )
[00:02:14.870,666] <dbg> bt_mesh_access: bt_mesh_access_send: net_idx 0x0000 app_idx 0x0000 dst 0x0001
[00:02:14.870,697] <dbg> bt_mesh_access: bt_mesh_access_send: len 4: ********
[00:02:14.871,429] <inf> bt_mesh_adv: 0x20027014 create
[00:02:14.872,375] <inf> bt_mesh_adv: 0x20027014 ref  : 1 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c bt_mesh_proxy_relay_send 251 )
[00:02:14.872,406] <inf> bt_mesh_adv: 0x20027014 ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv.c bt_mesh_adv_send 302 )
[00:02:14.872,467] <inf> bt_mesh_adv: 0x20027014 unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/net.c bt_mesh_net_send 596 )
[00:02:14.873,596] <inf> bt_mesh_adv: 0x2002708c ref  : 1 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv_ext.c adv_send 271 )
[00:02:14.873,626] <inf> bt_mesh_adv: 0x2002708c unref: 2 -> 1 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv_ext.c send_pending_adv 356 )
[00:02:14.873,687] <inf> bt_mesh_adv: 0x20027050 ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_relay_send 228 )
[00:02:14.873,809] <inf> bt_mesh_adv: 0x20027050 unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_msg_send_pending 274 )
[00:02:14.873,901] <inf> bt_mesh_adv: 0x20027014 ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_relay_send 228 )
[00:02:14.874,023] <inf> bt_mesh_adv: 0x20027014 unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_msg_send_pending 274 )
[00:02:14.874,084] <inf> bt_mesh_adv: 0x20027014 ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_relay_send 228 )
[00:02:14.874,206] <inf> bt_mesh_adv: 0x20027014 unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_msg_send_pending 274 )
[00:02:14.874,267] <inf> bt_mesh_adv: 0x20027014 ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_relay_send 228 )
[00:02:14.874,389] <inf> bt_mesh_adv: 0x20027014 unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_msg_send_pending 274 )
[00:02:14.874,481] <inf> bt_mesh_adv: 0x20027014 ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_relay_send 228 )
[00:02:14.890,625] <inf> bt_mesh_adv: 0x20027050 unref: 2 -> 1 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c buf_send_end 211 )
[00:02:14.890,747] <inf> bt_mesh_adv: 0x20027014 unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_msg_send_pending 274 )
[00:02:14.890,808] <inf> bt_mesh_adv: 0x20027014 ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_relay_send 228 )
[00:02:14.891,601] <dbg> bt_mesh_access: bt_mesh_access_send: net_idx 0x0000 app_idx 0x0000 dst 0x0001
[00:02:14.891,632] <dbg> bt_mesh_access: bt_mesh_access_send: len 7: **************
[00:02:14.892,364] <inf> bt_mesh_adv: 0x20026fd8 create
[00:02:14.893,280] <inf> bt_mesh_adv: 0x20026fd8 ref  : 1 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c bt_mesh_proxy_relay_send 251 )
[00:02:14.893,341] <inf> bt_mesh_adv: 0x20026fd8 ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv.c bt_mesh_adv_send 302 )
[00:02:14.893,371] <inf> bt_mesh_adv: 0x20026fd8 unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/net.c bt_mesh_net_send 596 )
[00:02:14.909,515] <dbg> bt_mesh_access: bt_mesh_access_send: net_idx 0x0000 app_idx 0x0000 dst 0x0001
[00:02:14.909,545] <dbg> bt_mesh_access: bt_mesh_access_send: len 7: **************
[00:02:14.910,278] <inf> bt_mesh_adv: 0x20026f9c create
[00:02:14.911,224] <inf> bt_mesh_adv: 0x20026f9c ref  : 1 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c bt_mesh_proxy_relay_send 251 )
[00:02:14.911,254] <inf> bt_mesh_adv: 0x20026f9c ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv.c bt_mesh_adv_send 302 )
[00:02:14.911,315] <inf> bt_mesh_adv: 0x20026f9c unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/net.c bt_mesh_net_send 596 )
[00:02:14.920,471] <inf> bt_mesh_adv: 0x20027014 unref: 3 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c buf_send_end 211 )
[00:02:14.920,593] <inf> bt_mesh_adv: 0x20027014 unref: 2 -> 1 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_msg_send_pending 274 )
[00:02:14.920,654] <inf> bt_mesh_adv: 0x20027014 ref  : 1 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_relay_send 228 )
[00:02:14.950,469] <inf> bt_mesh_adv: 0x20027014 unref: 2 -> 1 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c buf_send_end 211 )
[00:02:14.950,592] <inf> bt_mesh_adv: 0x20027014 unref: 1 -> 0 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_msg_send_pending 274 )
[00:02:14.950,592] <inf> bt_mesh_adv: 0x20027014 free
[00:02:14.950,653] <inf> bt_mesh_adv: 0x2002708c unref: 1 -> 0 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv_ext.c send_pending_adv 337 )
[00:02:14.950,683] <inf> bt_mesh_adv: 0x2002708c free
[00:02:14.954,345] <inf> bt_mesh_adv: 0x20027050 ref  : 1 -> 2 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv_ext.c adv_send 271 )
[00:02:14.954,376] <inf> bt_mesh_adv: 0x20027050 unref: 2 -> 1 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv_ext.c send_pending_adv 356 )
[00:02:14.954,437] <inf> bt_mesh_adv: 0x20026f9c ref  : 2 -> 3 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c proxy_relay_send 228 )
[00:02:14.980,255] <inf> bt_mesh_adv: 0x20027014 unref: 0 -> 4294967295 ( WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/proxy_msg.c buf_send_end 211 )
ASSERTION FAIL @ WEST_TOPDIR/zephyr/subsys/bluetooth/mesh/adv.c:137
[00:02:14.980,316] <err> os: r0/a1:  0x00000004  r1/a2:  0x00000089  r2/a3:  0x00000041
[00:02:14.980,316] <err> os: r3/a4:  0x00000004 r12/ip:  0x00000010 r14/lr:  0x000212af
[00:02:14.980,346] <err> os:  xpsr:  0x21000000
[00:02:14.980,346] <err> os: s[ 0]:  0x000000d3  s[ 1]:  0x00000000  s[ 2]:  0x20016170  s[ 3]:  0x0002ea51
[00:02:14.980,346] <err> os: s[ 4]:  0x00000001  s[ 5]:  0x0001c32f  s[ 6]:  0x20026b8c  s[ 7]:  0x00000000
[00:02:14.980,377] <err> os: s[ 8]:  0x200053c4  s[ 9]:  0x00000034  s[10]:  0x2001ef88  s[11]:  0x200181a8
[00:02:14.980,377] <err> os: s[12]:  0x200035c8  s[13]:  0x00000001  s[14]:  0x20016230  s[15]:  0x00014751
[00:02:14.980,407] <err> os: fpscr:  0x20025258
[00:02:14.980,407] <err> os: Faulting instruction address (r15/pc): 0x0004067a
[00:02:14.980,438] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:02:14.980,468] <err> os: Current thread: 0x200173b8 (unknown)
[00:02:15.808,959] <err> fatal_error: Resetting system


About the address of the "struct bt_mesh_adv" for the Message.
1st message(0x2002708c) : The Address has been freed.
2nd message(0x20027050) : The Address has not been freed yet.
3rd message(0x20027014) : After the address has been freed, "bt_mesh_adv_unref()" was called.

Do you know what the cause is?
Is it expected that changing the configuration parameters will lead to improvement?
It might be due to communication load, but I would like to avoid it causing a reset.

Thanks for reading.
Season's Greetings!

a.da

Parents
  • Hi,

    Thank you for the report, it is highly appreciated.

    You have done some thorough investigations and provided us with what looks like the relevant information. Hopefully this is enough for us to figure out what leads to the assert. I will ask for more information here in this thread, if we need more.

    From what I can tell, this type of issue is most likely a bug in the stack itself. Configuration changes may or may not change the prevalence of the issue, but we will likely not know until we find the root cause. I am afraid I do not have any time estimate for when we will know more, but rest assured we are looking into the issue and I will provide an update here when more is known at our side.

    Regards,
    Terje

Reply
  • Hi,

    Thank you for the report, it is highly appreciated.

    You have done some thorough investigations and provided us with what looks like the relevant information. Hopefully this is enough for us to figure out what leads to the assert. I will ask for more information here in this thread, if we need more.

    From what I can tell, this type of issue is most likely a bug in the stack itself. Configuration changes may or may not change the prevalence of the issue, but we will likely not know until we find the root cause. I am afraid I do not have any time estimate for when we will know more, but rest assured we are looking into the issue and I will provide an update here when more is known at our side.

    Regards,
    Terje

Children
  • Hi Terje,

    Thank you for your continued support. I will respond as quickly as possible as well.

    Kind regards,

    a.da

  • Hi,

    What are the steps to reproduce the issue? If this only happens on a  custom project, we would highly appreciate if you strip it down to a minimal example still triggering the issue, and send it to us. Please start a private case here on DevZone and send it there, if you do not want to send it in public.

    We also need the steps to reproduce, regardless of whether it is for SDK samples or a custom application. E.g. which sample, are there prj.conf changes needed, how to interact with the device in order to trigger the issue.

    Alternatively, in the mean time, there is an active pull request for the Bluetooth Mesh stack, which may or may not solve the issue, as it touches on related parts of the stack to the one failing in the issue that you are seeing. Since we have not yet reproduced the issue, we do not know whether this pull request fixes it or not, but I guess you could try applying those changes in your end and see if it resolves the issue. Please note that since this is from the latest branch of Zephyr / nRF Connect SDK, some (maybe undocumented) migration may be needed.

    Please note that we would still like the steps to reproduce, so that we can check this also in our end, as well as investigate further if other changes are needed.

    Regards,
    Terje

  • Hi Terje,

    Project:
    Our project is a custom application based on SDK samples(Light fixture).
    The "prj.conf" has been modified based on the sample.
    The smartphone app is not "nRF Mesh," it is custom-built(iOS).

    Steps to reproduce:
    There is one device in the mesh network.
    The smartphone app sends four Mesh Messages simultaneously.
    0x82 0x4b Light Lightness Get,
    0x82 0x61 Light CTL Temperature Get,
    and two Vendor Model Message that requests a status response like "Light Lightness Get".
    But it doesn't happen every time.

    First, I will strip it down to a minimal example that is closer to the sample.
    I will also consider applying the pull request.
    Please give me a moment.

    Kind regards,

    a.da

Related