This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Time for configuring node increases after deleting from mesh network

Hi guys,

I have the same issue as coca1989 had 2 years ago. Did anybody found a solution?

I am using health model and simple message model. Client and Provisioner are running on one device (demoboard). I can provision and configure upto 5 server nodes (dongles). I get health events from each connected server (all 10 seconds) and I can send and receive small messages on all server nodes.

Now, I would like to remove nodes from the mesh network and reconnect them (reprovisioning and reconfiguration). This are the steps that I am doing:

config_client_server_bind() and config_client_server_set() to the server node I would like to remove from network
config_client_node_reset()
the server gets the node reset event (CONFIG_SERVER_EVT_NODE_RESET) from client and performs node_reset() with: mesh_stack_config_clear() and mesh_stack_device_reset()
the server responds to the client with CONFIG_CLIENT_EVENT_TYPE_CANCELLED and I do dsm_devkey_delete()

After removing the server node, I can reprovision and reconfigure the node successfully (getting health events and send/receive messages). But the configuration takes longer then the first time. Repeating this process (removing node and reconnecting) increases the configuration time each time.

Here is a time table:

First configuration: 2-3 seconds
Second configuration (after removing node from mesh): 10-11 seconds
Third configuration (after removing node from mesh):20-30 seconds
Fourth configuration (after removing node from mesh): 45 -50 seconds
Fifth configuration (after removing node from mesh): >80 seconds

This is reproduceable. Rebooting the client/provisioner device after removing a server node reduces the configuration time back to 2-3 seconds, but I do not get health events and no messages.

During reconfiguration (after removing the server from network) I am getting SAR mesh events on the server node. At the first configuration (fresh device) I dont have this SAR events.

I guess I have to delete more on client side? Maybe the simple message or health is still active on the last address handles?

Parents

0 JeffZ over 4 years ago

Hi, I tried to look what is happening on the server node, after provisioning during configudation. This is how it looks like if I have a fresh device:

<t: 16480>, my_provisionee.c, 289, Provsionee stopped.
<t: 172720>, access.c, 246, RX: [aop: 0x8008]
<t: 172743>, my_common.c, 163, RSSI: -55
<t: 176098>, access.c, 246, RX: [aop: 0x8008]
<t: 176121>, my_common.c, 163, RSSI: -54
<t: 180618>, access.c, 246, RX: [aop: 0x0000] --> configuraion continues
<t: 180626>, config_server.c, 623, dsm_appkey_add(appkey_handle:0 appkey_index:0)
<t: 180636>, my_server.c, 61, Config_server_evt_cb (TYPE: 0)
<t: 180639>, my_common.c, 163, RSSI: -55
<t: 182332>, access.c, 246, RX: [aop: 0x803D]
<t: 182335>, config_server.c, 2414, Access Info:
element_index=0 model_id = 2-FFFF model_handle=1
<t: 182347>, my_server.c, 61, Config_server_evt_cb (TYPE: 22)
<t: 182351>, my_common.c, 163, RSSI: -54
<t: 183109>, access.c, 246, RX: [aop: 0x803D]
<t: 183112>, config_server.c, 2414, Access Info:
element_index=0 model_id = 0-91C model_handle=2
<t: 183124>, my_server.c, 61, Config_server_evt_cb (TYPE: 22)
<t: 183127>, my_common.c, 163, RSSI: -54
<t: 184562>, access.c, 246, RX: [aop: 0x0003]
<t: 184579>, my_server.c, 61, Config_server_evt_cb (TYPE: 2)
<t: 184582>, my_common.c, 163, RSSI: -55
<t: 188517>, access.c, 246, RX: [aop: 0x0003]
<t: 188534>, my_server.c, 61, Config_server_evt_cb (TYPE: 2)
<t: 188537>, my_common.c, 163, RSSI: -55
<t: 191548>, access.c, 246, RX: [aop: 0x801B]

You can see here, that aop 0x8008 (get_composition_data) is sent twice only and the configuration process start almost immediat.

This is how it looks like if I unprovision/delete the node from mesh (as I discribed in my ticket):

<t: 1029333>, my_provisionee.c, 289, Provsionee stopped.
<t: 1031812>, access.c, 246, RX: [aop: 0x8008]
<t: 1031835>, my_common.c, 163, RSSI: -54
<t: 1044671>, access.c, 246, RX: [aop: 0x8008]
<t: 1044676>, my_common.c, 163, RSSI: -54
<t: 1057725>, access.c, 246, RX: [aop: 0x8008]
<t: 1057730>, my_common.c, 163, RSSI: -54
<t: 1084273>, access.c, 246, RX: [aop: 0x8008]
<t: 1084277>, my_common.c, 163, RSSI: -54
<t: 1136351>, access.c, 246, RX: [aop: 0x8008]
<t: 1136355>, my_common.c, 163, RSSI: -55
<t: 1146584>, my_common.c, 201, SAR FAILED: token 1, reason 1
<t: 1239180>, nrf_mesh_dfu.c, 904, ERROR: No CMD handler!
<t: 1241178>, access.c, 246, RX: [aop: 0x8008]
<t: 1241201>, my_common.c, 163, RSSI: -54
<t: 1355949>, my_common.c, 201, SAR FAILED: token 6, reason 1
<t: 1451464>, access.c, 246, RX: [aop: 0x8008]
<t: 1451487>, my_common.c, 163, RSSI: -54
<t: 1566236>, my_common.c, 201, SAR FAILED: token 7, reason 1
<t: 1568045>, nrf_mesh_dfu.c, 904, ERROR: No CMD handler!
<t: 1860592>, my_common.c, 234, default:4
<t: 1870591>, access.c, 246, RX: [aop: 0x8008]
<t: 1870614>, my_common.c, 163, RSSI: -54
<t: 1985362>, my_common.c, 201, SAR FAILED: token 8, reason 1

... -> configuration continues after see time table above.

You can see here that the server node gets the aop 0x8008 too, but much more than two times. The number of aop 0x8008 (get_composition_data) messages increases after deleting the node again from the mesh network.

What can cause this issue/behaivor?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Mttrinh over 4 years ago in reply to JeffZ

I'm not sure why this happens, I will need more time to figure out what is causing this behavior. Do you see the same if you use the nRF Mesh app(Android/iOS) to provision/configure and reset the nodes?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Mttrinh over 4 years ago in reply to Mttrinh

Could you try reproducing this on the latest Mesh SDK v4.2 as well?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 JeffZ over 4 years ago in reply to Mttrinh

Hello Mttrinh,

now I am trying to get the migration done from Mesh SDK 3.1.0 to 4.2. As I wrote above I am using SoftDevice 140. But v4.2 has only S130 and S132 in its external folder.

My App stucks stops here:
SVCALL(SD_SOFTDEVICE_ENABLE, uint32_t, sd_softdevice_enable(nrf_clock_lf_cfg_t const * p_clock_lf_cfg, nrf_fault_handler_t fault_handler));

What should I do?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 JeffZ over 4 years ago in reply to JeffZ

I guess the problem is that I am using S140 v6.1.0 but Mesh SDK v4.2 works only with S140 v7.0.1..

This is a problem, I am using nrf5 SDK v15.2.0 which supports only S140 v6.1.0. I tried to migrate to nrf5 SDK 17.0.0 (which supposrt S140 v7.0.1) but problems starting already with simple nrf_serial.c (no such file found).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 JeffZ over 4 years ago in reply to Mttrinh

I upgraded my project to Mesh SDK v4.2 but this didn't help. The problem still exists. I have the same increase of configuration time and SAR FAILED events.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Mttrinh over 4 years ago in reply to Mttrinh

I have asked our developers about this, I will update you when I have something.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Mttrinh over 4 years ago in reply to Mttrinh

I have asked our developers about this, I will update you when I have something.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 JeffZ over 4 years ago in reply to Mttrinh

Great, thank you!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 JeffZ over 4 years ago in reply to Mttrinh

Hi Mttrinh,

Above I wrote that rebooting the client/provisioner reduces the configuration time back to 2-3 seconds, but health and simple message models wont work.

Now, I managed to run correctly the health and simple message model after reboot. The problem was a wrong appkey.

After rebooting the appkey was not loaded from flash, it stayed zero. I thought dsm_appkey_get_all() will handle it, but I was wrong. Now I am using dsm_tx_secmat_get() for loading the appkey from flash. With the correct appkey all mesh models are working fine after. I am still wondering why there is not dsm_appkey_get() function, this would make my life easier.

The increasing configuration time of removed server nodes still exists if I do not reboot the client/provisioner. I guess have have to refresh something in the mesh stack on client site.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Mttrinh over 4 years ago in reply to JeffZ

Hi,

Great that the health and simple message models issue worked out. A question from our developer:

Are you using the same Unicast address for the node every time they reprovision?

If that is the case then what is happening here is that the provisioner's replay list filters out the Node's incoming responses until your sequence number is higher than the last known sequence number. That is why you see increasing time to fully configure the node, as after reprovisioning the node's sequence numbers are reset to zero.

The only reasonable way to work around this issue is to not use the same node address while reprovisioning the node. Otherwise, you will have to reset the provisioner or clear the replay list with internal API (both options carry a risk of replay attacks so should be chosen wisely).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 JeffZ over 4 years ago in reply to Mttrinh

Hi,

thanks for fast response.

Yes, I do reuse the unicast address, because I thought I can reuse them after deleting the node from network. I plan to use max. 50 server nodes, starting with address 500..

What is the max. buffer size of this relay list?
I guess, there will be a limit and what to do if this limit is reached?

Do I undestand the work around correctly?:

1. For provisioning I use an increasing unicast address (1 to limit) at start_provisioning()/nrf_mesh_prov_provision() in nrf_mesh_prov_provisioning_data_t.

2. On NRF_MESH_PROV_EVT_COMPLETE I dont use the unicast address, I do set my node address (500-549) with dsm_address_publish_add()

3. Procceed with node configuration with node address (500-549)

Is this correct?

Best reagards,

Jeff
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Mttrinh over 4 years ago in reply to JeffZ

1. Replay list size is controlled by `REPLAY_CACHE_ENTRIES` define. Define this macro with suitable size in your application's `nrf_mesh_config_app.h` file. The size is limited by the free RAM memory your application has.

2. Beware a pitfall of frequently un-provisioning and re-provisioning nodes with changing of unicast address: Once the entry for a certain source address is added to a replay list, this entry is not removed until the device is reset OR device undergoes IV index update cycle. Once the replay list is full, the device will stop accepting messages from the new addresses that are not present in the replay list entries.

3. Once you unprovision the node, remember to remove the address from the DSM, otherwise, you will end up filling available address space in the DSM module as well.

4. Address selection for re-provisioning: A node can have multiple elements. So, when you are re-provisioning a node, you have to exclude all the addresses that were present on a node. Irrespective of whether your node has one or more elements, you have to ensure that the elements on the node do not receive duplicate addresses (i.e. the address that is already in use on some active node in the network).

5. The procedure would be:
a. Unprovision the node
b. Remove the address added for publication from the DSM using DSM API. Refer to `config_server.c` module's model publication set handlers to see how this is done.
c. Use the fresh address for the node such that the address of the primary (and secondary, if any) element on the node is not already used on any other node in the network.
d. Add this address to DSM and use it for publishing.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel