Stack overflow when provisioning mesh nodes

Good evening,

I have recently purchased an nRF52 development kit which I would like to use for my project. Currenty, I would like to provision external devices using this board, howerer, I cant seem to succeed in getting the provisioning process to work with the default unmodified example from nrf5 connect sdk. I am using the latest sdk and toolchain. I attach a copy of my log below. I have done some debugging using vscode and the program fails on this line:

/* Add Application Key */
err = bt_mesh_cfg_cli_app_key_add(net_idx, node->addr, net_idx, app_idx, app_key, &status);
if (err || status)
{
	printk("Failed to add app-key (err %d status %d)\n", err, status);
	return;
}


Log:

*** Booting nRF Connect SDK v2.9.0-7787b2649840 ***
*** Using Zephyr OS v3.7.99-1f8f3dc29142 ***
Initializing...
[00:00:00.007,324] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.007,324] <inf> fs_nvs: alloc wra: 0, ef0
[00:00:00.007,354] <inf> fs_nvs: data wra: 0, 1d0
[00:00:00.007,446] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision: 
                                            2d 79 a1 c8 6a 40 b7 3c  f6 74 f9 0b 22 d3 c4 80 |-y..j@.< .t.."...
                                            74 72 82 ba                                      |tr..             
[00:00:00.009,948] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.009,979] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.010,009] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 45.41337 Build 3074452168
[00:00:00.010,223] <inf> bt_hci_core: No ID address. App must call settings_load()
Bluetooth initialized
[00:00:00.010,284] <dbg> bt_mesh_health_cli: health_cli_init: primary 1
Mesh initialized
Loading stored settings
[00:00:00.423,156] <dbg> bt_mesh_access: mod_set: Decoded mod_key 0x0002 as elem_idx 0 mod_idx 2
[00:00:00.423,217] <dbg> bt_mesh_access: mod_set_bind: val
                                         00 00                                            |..               
[00:00:00.423,217] <dbg> bt_mesh_access: mod_set_bind: Decoded 1 bound keys for model
[00:00:00.423,400] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           06 00 00                                         |...              
[00:00:00.423,431] <dbg> bt_mesh_net: seq_set: Sequence Number 0x00007f
[00:00:00.423,645] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 00 00 00                                   |.....            
[00:00:00.423,675] <dbg> bt_mesh_net: iv_set: IV Index 0x0000 (IV Update Flag 0) duration 0 hours
[00:00:00.423,950] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           01 00 51 81 7e 78 0a 2c  c4 2c 78 95 70 d9 88 a7 |..Q.~x., .,x.p...
                                           eb b5                                            |..               
[00:00:00.423,950] <dbg> bt_mesh_access: bt_mesh_comp_provision: addr 0x0001 elem_count 1
[00:00:00.423,980] <dbg> bt_mesh_access: bt_mesh_comp_provision: addr 0x0001 mod_count 3 vnd_mod_count 0
[00:00:00.424,011] <dbg> bt_mesh_net: net_set: Provisioned with primary address 0x0001
[00:00:00.424,041] <dbg> bt_mesh_net: net_set: Recovered DevKey 51817e780a2cc42c789570d988a7ebb5
[00:00:00.424,377] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 00 ef 25 32 55 dc  cf 31 d5 d8 67 6f 26 66 |....%2U. .1..go&f
                                           a9 bb 3e 00 00 00 00 00  00 00 00 00 00 00 00 00 |..>..... ........
                                           00 00 00                                         |...              
[00:00:00.426,177] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 e7 2e 7c 63 70 11 c5  7e 4e 43 9f 88 56 d4 27 |...|cp.. ~NC..V.'
                                           c0 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |........ ........
                                           00                                               |.                
[00:00:00.434,356] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 e7 2e 7c 63 70 11 c5  7e 4e 43 9f 88 56 d4 27 |...|cp.. ~NC..V.'
                                           c0 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |........ ........
                                           00                                               |.                
[00:00:00.434,814] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 00 ef 25 32 55 dc  cf 31 d5 d8 67 6f 26 66 |....%2U. .1..go&f
                                           a9 bb 3e 00 00 00 00 00  00 00 00 00 00 00 00 00 |..>..... ........
                                           00 00 00                                         |...              
[00:00:00.435,058] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 01 01 dd dd 00 00  00 00 00 00 00 00 00 00 |........ ........
                                           00 00 00 00 51 81 7e 78  0a 2c c4 2c 78 95 70 d9 |....Q.~x .,.,x.p.
                                           88 a7 eb b5                                      |....             
[00:00:00.435,577] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 00 00 00 01 00                             |.......          
[00:00:00.436,157] <inf> bt_hci_core: Identity: C8:72:BA:38:85:E4 (random)
[00:00:00.436,187] <inf> bt_hci_core: HCI: version 6.0 (0x0e) revision 0x106b, manufacturer 0x0059
[00:00:00.436,218] <inf> bt_hci_core: LMP: version 6.0 (0x0e) subver 0x106b
Using stored CDB
[00:00:00.439,514] <inf> bt_mesh_main: Primary Element: 0x0001
[00:00:00.439,514] <dbg> bt_mesh_main: bt_mesh_provision: net_idx 0x0000 flags 0x00 iv_index 0x0000
Using stored settings
Waiting for unprovisioned beacon...
Device dc234e37efea60006164657777393261 detected, press button 1 to provision.
Provisioning dc234e37efea60006164657777393261
Waiting for node to be added...
[00:00:10.173,248] <dbg> bt_mesh_settings: bt_mesh_settings_store_schedule: Waiting 0 ms vs rem 0 ms
[00:00:10.173,339] <dbg> bt_mesh_settings: store_pending: 
Added node 0x0002
Configuring node 0x0002...
[00:00:10.174,560] <dbg> bt_mesh_access: bt_mesh_access_send: net_idx 0x0000 app_idx 0xfffd dst 0x0002
[00:00:10.174,621] <dbg> bt_mesh_access: bt_mesh_access_send: len 20: 00000000ef253255dccf31d5d8676f2666a9bb3e
[00:00:10.174,652] <dbg> bt_mesh_transport: bt_mesh_trans_send: net_idx 0x0000 app_idx 0xfffd dst 0x0002
[00:00:10.174,743] <dbg> bt_mesh_transport: bt_mesh_trans_send: len 20: 00000000ef253255dccf31d5d8676f2666a9bb3e
[00:00:10.174,835] <dbg> bt_mesh_transport: send_seg: src 0x0001 dst 0x0002 app_idx 0xfffd aszmic 0 sdu_len 24
[00:00:10.174,865] <dbg> bt_mesh_transport: send_seg: SeqZero 0x007f (segs: 2)
[00:00:10.174,926] <dbg> bt_mesh_transport: send_seg: seg 0: e5b09f9d4bba08329d19181a
[00:00:10.174,987] <dbg> bt_mesh_transport: send_seg: seg 1: f7e359505f2e60485f8c3520
[00:00:10.174,987] <dbg> bt_mesh_transport: seg_tx_send_unacked: SeqZero: 0x007f Attempts: 3
[00:00:10.175,018] <dbg> bt_mesh_transport: seg_tx_send_unacked: Sending 0/1
[00:00:10.175,048] <dbg> bt_mesh_net: bt_mesh_net_send: src 0x0001 dst 0x0002 len 16 headroom 9 tailroom 4
[00:00:10.175,079] <dbg> bt_mesh_net: bt_mesh_net_send: Payload len 16: 8001fc01e5b09f9d4bba08329d19181a
[00:00:10.175,109] <dbg> bt_mesh_net: bt_mesh_net_send: Seq 0x00007f
[00:00:10.175,140] <dbg> bt_mesh_net: net_header_encode: src 0x0001 dst 0x0002 ctl 0 seq 0x00007f
[00:00:10.175,140] <dbg> bt_mesh_settings: bt_mesh_settings_store_schedule: Waiting 0 ms vs rem 0 ms
[00:00:10.180,328] <err> os: ***** MPU FAULT *****
[00:00:10.180,328] <err> os:   Stacking error (context area might be not valid)
[00:00:10.180,328] <err> os:   Data Access Violation
[00:00:10.180,328] <err> os:   MMFAR Address: 0x20006578
[00:00:10.180,358] <err> os: r0/a1:  0x6181c202  r1/a2:  0x783a4e46  r2/a3:  0x0c8b61c2
[00:00:10.180,389] <err> os: r3/a4:  0x00000007 r12/ip:  0x0000000e r14/lr:  0x20006728
[00:00:10.180,389] <err> os:  xpsr:  0x21000000
[00:00:10.180,389] <err> os: Faulting instruction address (r15/pc): 0x00038688
[00:00:10.180,450] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
[00:00:10.180,480] <err> os: Current thread: 0x20003280 (BT MESH WQ)
[00:00:10.412,597] <err> os: Halting system

Thanks for any help.


Kind regards,

Viktor

Parents
  • Hi Viktor,

    I have recently purchased an nRF52 development kit which I would like to use for my project. Currenty, I would like to provision external devices using this board, howerer, I cant seem to succeed in getting the provisioning process to work with the default unmodified example from nrf5 connect sdk.

    I assume you are then doing something on the cli that makes this happen. Could you show me what?

    What sample is this?

    Does increasing the stack size help? Eg.

    CONFIG_MAIN_STACK_SIZE

    CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE

    Regards,

    Elfving

  • Hello Elfving,

    I am flashing the application using vscode and nrf connect toolchain as explained in the official video series.

    https://github.com/zephyrproject-rtos/zephyr/tree/main/samples/bluetooth/mesh_provisioner - this is the sample i am using.

    I tried setting the options as such, but the program is ending in same kernel panic error.

    CONFIG_MAIN_STACK_SIZE=4096
    CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096

    Regards,

    Viktor

  • Could you try increasing CONFIG_BT_RX_STACK_SIZE as well?

    This does not seem to have made any difference - still same issue.

    I should mention that you typically do not use an embedded device as a provisoner in a typical setting, since that role is so important to the network, which I believe is why nordic does not provide a sample for this in NCS. Though we do have an open-source smart-phone application for it. I see that we did have a sample for this in the old nRF5 SDK which mentions this in the documentation: "Because of the limited amount of memory available in embedded processors, using a standalone provisioner limits the amount of nodes that can be provisioned. This limit corresponds to the maximum size of the Bluetooth mesh network." I guess for the same reason this zephyr provisioner might be a bit limited. Of course an attempted provisioning shouldn't result in a stack overflow, but I just wanted that mentioned. 

    So if your end goal is just to provision this chip I think the smart-phone app is the best option.

    Okay, can you suggest the best approach for my use case? I need to be able to provision about a thousand lights in our building. As you said, this can and probably should be done using a mobile device. The tricky part is that I also need to be able to control the lights over the internet - so I thought that many of these embedded devices will act as a sort of "mesh to internet gateways". Is there a better way to approach this?

    Kind regards,

    Viktor

  • I just recalled that I had a previous case in which some provisioner sample, either our old one or this one from zephyr, used the assumption that each node had just one element. If that is not the case then of course addressing will run into an issue. Maybe something similar is happening here as well. 

    susenka said:
    I need to be able to provision about a thousand lights in our building.

    Ah I see. The provisioner is an extremely important and powerful role for the network, so I would generally recommend a host device similar to a smart phone (or multiple) for this. I guess there are scenarios in which using an embedded device is warranted, like if you need to work on the network over LTE and need a gateway, like you mention. Though in that case I would make this from scratch and not base it too much on the provisioning samples (as they are not that great as a starting point for this), and add external flash for all the data that needs to be stored. 

    Our nRF Mesh app is mainly a development tool, not an end-customer product. So I wouldn't use that for a proper network like this either. Though if your goal is to make a tool that does this job then we do have libraries for android and ios that you can use.

    But if you want something that just works for this network, I am not sure if we can provide what you want. I bet there are tools in the market that allready have this gateway functionality. 

    Regards,

    Elfving 

  • I know that nrf mesh is for development only, so I would eventually write my own application to handle the provisioning. The thing is, I have tried many applications like nrf mesh and all of them failed in the same way. Is it possible that the device manufacturer has some mechanism that prevents provisioning when not using their application or sdk? Just to note the current state - with the application im able to

    1) set app & dev keys
    2) get composition data
    3) bind app key to all models

    But all models (except configuration) are unresponsive - so I cannot turn the lights on/off for example. After 30 seconds the device disconnects - maybe there is some kind of properitery check or something that prevents the device from working with provisioner that follows the generic spec?

    If this is beyond the scope of this forum or you cannot help me with this, no worries! Thanks.

    Regards,
    Viktor

  • susenka said:
    Is it possible that the device manufacturer has some mechanism that prevents provisioning when not using their application or sdk?

    I guess that is possible. I don't think we do this on our app, but when SoC manufacturers provide apps like this for free I suppose that is one way they can make sure that they end up making money in the end. What Nordic typically does is just to provide high quality, and assume that people will end up going for their products in the end. 

    I am not sure about what other products for taking care of a large mesh network are available in the market, but I would recommend contacting your local RSM about this. They might know something.

    susenka said:
    But all models (except configuration) are unresponsive

    If this happens with other SoCs than nordic, ie. Telink chips, then it is ofcourse easy to think it is their BLE stack that is at fault, though this generally shouldn't happen.

    Regards,

    Elfving

  • I have done some digging around and I assume that the smart device manufacturer requires some sort of message to be sent after provisioning to let the device know that it has successfully registered with the manufacturer's application. My question is - can I somehow capture the mesh traffic between the provisioner and the device using Wireshark or some such tool? I have a nrf dongle at hand that I can use.  Can I decrypt the traffic given that I know the appkeys and device keys used to encrypt it? If so, can you give me a little hint on how to accomplish this? Thanks.

    Regards,

    Viktor

Reply
  • I have done some digging around and I assume that the smart device manufacturer requires some sort of message to be sent after provisioning to let the device know that it has successfully registered with the manufacturer's application. My question is - can I somehow capture the mesh traffic between the provisioner and the device using Wireshark or some such tool? I have a nrf dongle at hand that I can use.  Can I decrypt the traffic given that I know the appkeys and device keys used to encrypt it? If so, can you give me a little hint on how to accomplish this? Thanks.

    Regards,

    Viktor

Children
Related