Stack overflow when provisioning mesh nodes

Good evening,

I have recently purchased an nRF52 development kit which I would like to use for my project. Currenty, I would like to provision external devices using this board, howerer, I cant seem to succeed in getting the provisioning process to work with the default unmodified example from nrf5 connect sdk. I am using the latest sdk and toolchain. I attach a copy of my log below. I have done some debugging using vscode and the program fails on this line:

/* Add Application Key */
err = bt_mesh_cfg_cli_app_key_add(net_idx, node->addr, net_idx, app_idx, app_key, &status);
if (err || status)
{
	printk("Failed to add app-key (err %d status %d)\n", err, status);
	return;
}


Log:

*** Booting nRF Connect SDK v2.9.0-7787b2649840 ***
*** Using Zephyr OS v3.7.99-1f8f3dc29142 ***
Initializing...
[00:00:00.007,324] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.007,324] <inf> fs_nvs: alloc wra: 0, ef0
[00:00:00.007,354] <inf> fs_nvs: data wra: 0, 1d0
[00:00:00.007,446] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision: 
                                            2d 79 a1 c8 6a 40 b7 3c  f6 74 f9 0b 22 d3 c4 80 |-y..j@.< .t.."...
                                            74 72 82 ba                                      |tr..             
[00:00:00.009,948] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.009,979] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.010,009] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 45.41337 Build 3074452168
[00:00:00.010,223] <inf> bt_hci_core: No ID address. App must call settings_load()
Bluetooth initialized
[00:00:00.010,284] <dbg> bt_mesh_health_cli: health_cli_init: primary 1
Mesh initialized
Loading stored settings
[00:00:00.423,156] <dbg> bt_mesh_access: mod_set: Decoded mod_key 0x0002 as elem_idx 0 mod_idx 2
[00:00:00.423,217] <dbg> bt_mesh_access: mod_set_bind: val
                                         00 00                                            |..               
[00:00:00.423,217] <dbg> bt_mesh_access: mod_set_bind: Decoded 1 bound keys for model
[00:00:00.423,400] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           06 00 00                                         |...              
[00:00:00.423,431] <dbg> bt_mesh_net: seq_set: Sequence Number 0x00007f
[00:00:00.423,645] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 00 00 00                                   |.....            
[00:00:00.423,675] <dbg> bt_mesh_net: iv_set: IV Index 0x0000 (IV Update Flag 0) duration 0 hours
[00:00:00.423,950] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           01 00 51 81 7e 78 0a 2c  c4 2c 78 95 70 d9 88 a7 |..Q.~x., .,x.p...
                                           eb b5                                            |..               
[00:00:00.423,950] <dbg> bt_mesh_access: bt_mesh_comp_provision: addr 0x0001 elem_count 1
[00:00:00.423,980] <dbg> bt_mesh_access: bt_mesh_comp_provision: addr 0x0001 mod_count 3 vnd_mod_count 0
[00:00:00.424,011] <dbg> bt_mesh_net: net_set: Provisioned with primary address 0x0001
[00:00:00.424,041] <dbg> bt_mesh_net: net_set: Recovered DevKey 51817e780a2cc42c789570d988a7ebb5
[00:00:00.424,377] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 00 ef 25 32 55 dc  cf 31 d5 d8 67 6f 26 66 |....%2U. .1..go&f
                                           a9 bb 3e 00 00 00 00 00  00 00 00 00 00 00 00 00 |..>..... ........
                                           00 00 00                                         |...              
[00:00:00.426,177] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 e7 2e 7c 63 70 11 c5  7e 4e 43 9f 88 56 d4 27 |...|cp.. ~NC..V.'
                                           c0 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |........ ........
                                           00                                               |.                
[00:00:00.434,356] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 e7 2e 7c 63 70 11 c5  7e 4e 43 9f 88 56 d4 27 |...|cp.. ~NC..V.'
                                           c0 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |........ ........
                                           00                                               |.                
[00:00:00.434,814] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 00 ef 25 32 55 dc  cf 31 d5 d8 67 6f 26 66 |....%2U. .1..go&f
                                           a9 bb 3e 00 00 00 00 00  00 00 00 00 00 00 00 00 |..>..... ........
                                           00 00 00                                         |...              
[00:00:00.435,058] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 01 01 dd dd 00 00  00 00 00 00 00 00 00 00 |........ ........
                                           00 00 00 00 51 81 7e 78  0a 2c c4 2c 78 95 70 d9 |....Q.~x .,.,x.p.
                                           88 a7 eb b5                                      |....             
[00:00:00.435,577] <dbg> bt_mesh_settings: bt_mesh_settings_set: val
                                           00 00 00 00 00 01 00                             |.......          
[00:00:00.436,157] <inf> bt_hci_core: Identity: C8:72:BA:38:85:E4 (random)
[00:00:00.436,187] <inf> bt_hci_core: HCI: version 6.0 (0x0e) revision 0x106b, manufacturer 0x0059
[00:00:00.436,218] <inf> bt_hci_core: LMP: version 6.0 (0x0e) subver 0x106b
Using stored CDB
[00:00:00.439,514] <inf> bt_mesh_main: Primary Element: 0x0001
[00:00:00.439,514] <dbg> bt_mesh_main: bt_mesh_provision: net_idx 0x0000 flags 0x00 iv_index 0x0000
Using stored settings
Waiting for unprovisioned beacon...
Device dc234e37efea60006164657777393261 detected, press button 1 to provision.
Provisioning dc234e37efea60006164657777393261
Waiting for node to be added...
[00:00:10.173,248] <dbg> bt_mesh_settings: bt_mesh_settings_store_schedule: Waiting 0 ms vs rem 0 ms
[00:00:10.173,339] <dbg> bt_mesh_settings: store_pending: 
Added node 0x0002
Configuring node 0x0002...
[00:00:10.174,560] <dbg> bt_mesh_access: bt_mesh_access_send: net_idx 0x0000 app_idx 0xfffd dst 0x0002
[00:00:10.174,621] <dbg> bt_mesh_access: bt_mesh_access_send: len 20: 00000000ef253255dccf31d5d8676f2666a9bb3e
[00:00:10.174,652] <dbg> bt_mesh_transport: bt_mesh_trans_send: net_idx 0x0000 app_idx 0xfffd dst 0x0002
[00:00:10.174,743] <dbg> bt_mesh_transport: bt_mesh_trans_send: len 20: 00000000ef253255dccf31d5d8676f2666a9bb3e
[00:00:10.174,835] <dbg> bt_mesh_transport: send_seg: src 0x0001 dst 0x0002 app_idx 0xfffd aszmic 0 sdu_len 24
[00:00:10.174,865] <dbg> bt_mesh_transport: send_seg: SeqZero 0x007f (segs: 2)
[00:00:10.174,926] <dbg> bt_mesh_transport: send_seg: seg 0: e5b09f9d4bba08329d19181a
[00:00:10.174,987] <dbg> bt_mesh_transport: send_seg: seg 1: f7e359505f2e60485f8c3520
[00:00:10.174,987] <dbg> bt_mesh_transport: seg_tx_send_unacked: SeqZero: 0x007f Attempts: 3
[00:00:10.175,018] <dbg> bt_mesh_transport: seg_tx_send_unacked: Sending 0/1
[00:00:10.175,048] <dbg> bt_mesh_net: bt_mesh_net_send: src 0x0001 dst 0x0002 len 16 headroom 9 tailroom 4
[00:00:10.175,079] <dbg> bt_mesh_net: bt_mesh_net_send: Payload len 16: 8001fc01e5b09f9d4bba08329d19181a
[00:00:10.175,109] <dbg> bt_mesh_net: bt_mesh_net_send: Seq 0x00007f
[00:00:10.175,140] <dbg> bt_mesh_net: net_header_encode: src 0x0001 dst 0x0002 ctl 0 seq 0x00007f
[00:00:10.175,140] <dbg> bt_mesh_settings: bt_mesh_settings_store_schedule: Waiting 0 ms vs rem 0 ms
[00:00:10.180,328] <err> os: ***** MPU FAULT *****
[00:00:10.180,328] <err> os:   Stacking error (context area might be not valid)
[00:00:10.180,328] <err> os:   Data Access Violation
[00:00:10.180,328] <err> os:   MMFAR Address: 0x20006578
[00:00:10.180,358] <err> os: r0/a1:  0x6181c202  r1/a2:  0x783a4e46  r2/a3:  0x0c8b61c2
[00:00:10.180,389] <err> os: r3/a4:  0x00000007 r12/ip:  0x0000000e r14/lr:  0x20006728
[00:00:10.180,389] <err> os:  xpsr:  0x21000000
[00:00:10.180,389] <err> os: Faulting instruction address (r15/pc): 0x00038688
[00:00:10.180,450] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
[00:00:10.180,480] <err> os: Current thread: 0x20003280 (BT MESH WQ)
[00:00:10.412,597] <err> os: Halting system

Thanks for any help.


Kind regards,

Viktor

Parents
  • Hi Viktor,

    I have recently purchased an nRF52 development kit which I would like to use for my project. Currenty, I would like to provision external devices using this board, howerer, I cant seem to succeed in getting the provisioning process to work with the default unmodified example from nrf5 connect sdk.

    I assume you are then doing something on the cli that makes this happen. Could you show me what?

    What sample is this?

    Does increasing the stack size help? Eg.

    CONFIG_MAIN_STACK_SIZE

    CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE

    Regards,

    Elfving

  • Hello Elfving,

    I am flashing the application using vscode and nrf connect toolchain as explained in the official video series.

    https://github.com/zephyrproject-rtos/zephyr/tree/main/samples/bluetooth/mesh_provisioner - this is the sample i am using.

    I tried setting the options as such, but the program is ending in same kernel panic error.

    CONFIG_MAIN_STACK_SIZE=4096
    CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096

    Regards,

    Viktor

  • I think there is some miscommunication, I am having problem with the provisioner while provisioning other devices. The sample I am flashing should serve as a provisoner for other devices or?

  • Ah you are right, sorry about that. I am not familiar with all the samples that comes directly from zephyr, like this one. I assumed it worked a bit like the mesh shell.

    When you say you are using the latest SDK and toolchain, you mean 2.9? Just want to make sure you're not using the main branch.

    What are you provisioning? Do you see the same thing when trying to provision a default sample?

    Regards,

    Elfving

  • I'm using the 2.9 tag SDK, by latest I meant the latest installable version through vscode.

    Sorry, I've actually forgot to mention that I'm provisioning a non nRF device a light to be exact, that uses telink chip. According to the manual, the light implements full bluetooth mesh spec. I assumed, that I can utilize the DK from nordic to provision and communicate with bluetooth chips for other brands, but correct me if I'am wrong. Thanks

    Regards,

    Viktor

  • susenka said:

    Sorry, I've actually forgot to mention that I'm provisioning a non nRF device a light to be exact, that uses telink chip. According to the manual, the light implements full bluetooth mesh spec. I assumed, that I can utilize the DK from nordic to provision and communicate with bluetooth chips for other brands, but correct me if I'am wrong. Thanks

    Yeah that shouldn't be a problem , given that the telink chip follows the spec, which I assume it does. Though I have seen some issues like this in the past. You could check if you able to provision the telink chip from the nRF Mesh app.

    Are you seeing the same issue with other samples running on the telink chip? If you are able to flash other sw on that.

    Regards,

    Elfving

  • You could check if you able to provision the telink chip from the nRF Mesh app.

    I have looked at the problem you linked to, and it seems to have some similarities. Im seemingly able to provision the light using the nrf connect app, but theres a catch. According to the manual of the light im trying to provision, there is a 30 second so-called configuration phase. If the device is not configured in this time window, the device will flash several times and then become unresponsive.

    However, I can get basic configuration data using the nrf mesh application, which I cannot do using the development kit because the provisioning process never advances to the configuration data retrieval stage.

    Are you seeing the same issue with other samples running on the telink chip? If you are able to flash other sw on that.

    I cannot flash anything on the telink chip since it is already consumer packaged in the light itself. it does seem to provide a uart interface that i can i theory use, but i would like to avoid the need to modify the firmware on the light itself.

    Regards,

    Viktor

Reply
  • You could check if you able to provision the telink chip from the nRF Mesh app.

    I have looked at the problem you linked to, and it seems to have some similarities. Im seemingly able to provision the light using the nrf connect app, but theres a catch. According to the manual of the light im trying to provision, there is a 30 second so-called configuration phase. If the device is not configured in this time window, the device will flash several times and then become unresponsive.

    However, I can get basic configuration data using the nrf mesh application, which I cannot do using the development kit because the provisioning process never advances to the configuration data retrieval stage.

    Are you seeing the same issue with other samples running on the telink chip? If you are able to flash other sw on that.

    I cannot flash anything on the telink chip since it is already consumer packaged in the light itself. it does seem to provide a uart interface that i can i theory use, but i would like to avoid the need to modify the firmware on the light itself.

    Regards,

    Viktor

Children
  • susenka said:

    I cannot flash anything on the telink chip since it is already consumer packaged in the light itself. it does seem to provide a uart interface that i can i theory use, but i would like to avoid the need to modify the firmware on the light itself.

    Hehe that is understandable. I just assumed it was some sort of development kit. 

    susenka said:
    However, I can get basic configuration data using the nrf mesh application, which I cannot do using the development kit because the provisioning process never advances to the configuration data retrieval stage.

    Ah, so that works. That is interesting. 

    Could you try increasing CONFIG_BT_RX_STACK_SIZE as well?

    I should mention that you typically do not use an embedded device as a provisoner in a typical setting, since that role is so important to the network, which I believe is why nordic does not provide a sample for this in NCS. Though we do have an open-source smart-phone application for it. I see that we did have a sample for this in the old nRF5 SDK which mentions this in the documentation: "Because of the limited amount of memory available in embedded processors, using a standalone provisioner limits the amount of nodes that can be provisioned. This limit corresponds to the maximum size of the Bluetooth mesh network." I guess for the same reason this zephyr provisioner might be a bit limited. Of course an attempted provisioning shouldn't result in a stack overflow, but I just wanted that mentioned. 

    So if your end goal is just to provision this chip I think the smart-phone app is the best option.

    Regards,

    Elfving

  • Could you try increasing CONFIG_BT_RX_STACK_SIZE as well?

    This does not seem to have made any difference - still same issue.

    I should mention that you typically do not use an embedded device as a provisoner in a typical setting, since that role is so important to the network, which I believe is why nordic does not provide a sample for this in NCS. Though we do have an open-source smart-phone application for it. I see that we did have a sample for this in the old nRF5 SDK which mentions this in the documentation: "Because of the limited amount of memory available in embedded processors, using a standalone provisioner limits the amount of nodes that can be provisioned. This limit corresponds to the maximum size of the Bluetooth mesh network." I guess for the same reason this zephyr provisioner might be a bit limited. Of course an attempted provisioning shouldn't result in a stack overflow, but I just wanted that mentioned. 

    So if your end goal is just to provision this chip I think the smart-phone app is the best option.

    Okay, can you suggest the best approach for my use case? I need to be able to provision about a thousand lights in our building. As you said, this can and probably should be done using a mobile device. The tricky part is that I also need to be able to control the lights over the internet - so I thought that many of these embedded devices will act as a sort of "mesh to internet gateways". Is there a better way to approach this?

    Kind regards,

    Viktor

  • I just recalled that I had a previous case in which some provisioner sample, either our old one or this one from zephyr, used the assumption that each node had just one element. If that is not the case then of course addressing will run into an issue. Maybe something similar is happening here as well. 

    susenka said:
    I need to be able to provision about a thousand lights in our building.

    Ah I see. The provisioner is an extremely important and powerful role for the network, so I would generally recommend a host device similar to a smart phone (or multiple) for this. I guess there are scenarios in which using an embedded device is warranted, like if you need to work on the network over LTE and need a gateway, like you mention. Though in that case I would make this from scratch and not base it too much on the provisioning samples (as they are not that great as a starting point for this), and add external flash for all the data that needs to be stored. 

    Our nRF Mesh app is mainly a development tool, not an end-customer product. So I wouldn't use that for a proper network like this either. Though if your goal is to make a tool that does this job then we do have libraries for android and ios that you can use.

    But if you want something that just works for this network, I am not sure if we can provide what you want. I bet there are tools in the market that allready have this gateway functionality. 

    Regards,

    Elfving 

  • I know that nrf mesh is for development only, so I would eventually write my own application to handle the provisioning. The thing is, I have tried many applications like nrf mesh and all of them failed in the same way. Is it possible that the device manufacturer has some mechanism that prevents provisioning when not using their application or sdk? Just to note the current state - with the application im able to

    1) set app & dev keys
    2) get composition data
    3) bind app key to all models

    But all models (except configuration) are unresponsive - so I cannot turn the lights on/off for example. After 30 seconds the device disconnects - maybe there is some kind of properitery check or something that prevents the device from working with provisioner that follows the generic spec?

    If this is beyond the scope of this forum or you cannot help me with this, no worries! Thanks.

    Regards,
    Viktor

  • susenka said:
    Is it possible that the device manufacturer has some mechanism that prevents provisioning when not using their application or sdk?

    I guess that is possible. I don't think we do this on our app, but when SoC manufacturers provide apps like this for free I suppose that is one way they can make sure that they end up making money in the end. What Nordic typically does is just to provide high quality, and assume that people will end up going for their products in the end. 

    I am not sure about what other products for taking care of a large mesh network are available in the market, but I would recommend contacting your local RSM about this. They might know something.

    susenka said:
    But all models (except configuration) are unresponsive

    If this happens with other SoCs than nordic, ie. Telink chips, then it is ofcourse easy to think it is their BLE stack that is at fault, though this generally shouldn't happen.

    Regards,

    Elfving

Related