This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Time for configuring node increases after deleting from mesh network

Hi guys,

I have the same issue as coca1989 had 2 years ago. Did anybody found a solution?

I am using health model and simple message model. Client and Provisioner are running on one device (demoboard). I can provision and configure upto 5 server nodes (dongles). I get health events from each connected server (all 10 seconds) and I can send and receive small messages on all server nodes.

Now, I would like to remove nodes from the mesh network and reconnect them (reprovisioning and reconfiguration). This are the steps that I am doing:

  1. config_client_server_bind() and config_client_server_set() to the server node I would like to remove from network
  2. config_client_node_reset()
  3. the server gets the node reset event (CONFIG_SERVER_EVT_NODE_RESET) from client and performs node_reset() with:  mesh_stack_config_clear() and mesh_stack_device_reset()
  4. the server responds to the client with CONFIG_CLIENT_EVENT_TYPE_CANCELLED and I do dsm_devkey_delete()

After removing the server node, I can reprovision and reconfigure the node successfully (getting health events and send/receive messages). But the configuration takes longer then the first time. Repeating this process (removing node and reconnecting) increases the configuration time each time.

Here is a time table:

First configuration: 2-3 seconds
Second configuration (after removing node from mesh): 10-11 seconds
Third configuration (after removing node from mesh):20-30 seconds
Fourth configuration (after removing node from mesh): 45 -50 seconds
Fifth configuration (after removing node from mesh): >80 seconds

This is reproduceable. Rebooting the client/provisioner device after removing a server node reduces the configuration time back to 2-3 seconds, but I do not get health events and no messages.

During reconfiguration (after removing the server from network) I am getting SAR mesh events on the server node. At the first configuration (fresh device) I dont have this SAR events.

I guess I have to delete more on client side? Maybe the simple message or health is still active on the last address handles?

Parents
  • Hi, I tried to look what is happening on the server node, after provisioning during configudation. This is how it looks like if I have a fresh device:

    <t: 16480>, my_provisionee.c, 289, Provsionee stopped.
    <t: 172720>, access.c, 246, RX: [aop: 0x8008]
    <t: 172743>, my_common.c, 163, RSSI: -55
    <t: 176098>, access.c, 246, RX: [aop: 0x8008]
    <t: 176121>, my_common.c, 163, RSSI: -54
    <t: 180618>, access.c, 246, RX: [aop: 0x0000] --> configuraion continues
    <t: 180626>, config_server.c, 623, dsm_appkey_add(appkey_handle:0 appkey_index:0)
    <t: 180636>, my_server.c, 61, Config_server_evt_cb (TYPE: 0)
    <t: 180639>, my_common.c, 163, RSSI: -55
    <t: 182332>, access.c, 246, RX: [aop: 0x803D]
    <t: 182335>, config_server.c, 2414, Access Info:
    element_index=0 model_id = 2-FFFF model_handle=1
    <t: 182347>, my_server.c, 61, Config_server_evt_cb (TYPE: 22)
    <t: 182351>, my_common.c, 163, RSSI: -54
    <t: 183109>, access.c, 246, RX: [aop: 0x803D]
    <t: 183112>, config_server.c, 2414, Access Info:
    element_index=0 model_id = 0-91C model_handle=2
    <t: 183124>, my_server.c, 61, Config_server_evt_cb (TYPE: 22)
    <t: 183127>, my_common.c, 163, RSSI: -54
    <t: 184562>, access.c, 246, RX: [aop: 0x0003]
    <t: 184579>, my_server.c, 61, Config_server_evt_cb (TYPE: 2)
    <t: 184582>, my_common.c, 163, RSSI: -55
    <t: 188517>, access.c, 246, RX: [aop: 0x0003]
    <t: 188534>, my_server.c, 61, Config_server_evt_cb (TYPE: 2)
    <t: 188537>, my_common.c, 163, RSSI: -55
    <t: 191548>, access.c, 246, RX: [aop: 0x801B]

    You can see here, that aop 0x8008 (get_composition_data) is sent twice only and the configuration process start almost immediat.

    This is how it looks like if I unprovision/delete the node from mesh (as I discribed in my ticket):

    <t: 1029333>, my_provisionee.c, 289, Provsionee stopped.
    <t: 1031812>, access.c, 246, RX: [aop: 0x8008]
    <t: 1031835>, my_common.c, 163, RSSI: -54
    <t: 1044671>, access.c, 246, RX: [aop: 0x8008]
    <t: 1044676>, my_common.c, 163, RSSI: -54
    <t: 1057725>, access.c, 246, RX: [aop: 0x8008]
    <t: 1057730>, my_common.c, 163, RSSI: -54
    <t: 1084273>, access.c, 246, RX: [aop: 0x8008]
    <t: 1084277>, my_common.c, 163, RSSI: -54
    <t: 1136351>, access.c, 246, RX: [aop: 0x8008]
    <t: 1136355>, my_common.c, 163, RSSI: -55
    <t: 1146584>, my_common.c, 201, SAR FAILED: token 1, reason 1
    <t: 1239180>, nrf_mesh_dfu.c, 904, ERROR: No CMD handler!
    <t: 1241178>, access.c, 246, RX: [aop: 0x8008]
    <t: 1241201>, my_common.c, 163, RSSI: -54
    <t: 1355949>, my_common.c, 201, SAR FAILED: token 6, reason 1
    <t: 1451464>, access.c, 246, RX: [aop: 0x8008]
    <t: 1451487>, my_common.c, 163, RSSI: -54
    <t: 1566236>, my_common.c, 201, SAR FAILED: token 7, reason 1
    <t: 1568045>, nrf_mesh_dfu.c, 904, ERROR: No CMD handler!
    <t: 1860592>, my_common.c, 234, default:4
    <t: 1870591>, access.c, 246, RX: [aop: 0x8008]
    <t: 1870614>, my_common.c, 163, RSSI: -54
    <t: 1985362>, my_common.c, 201, SAR FAILED: token 8, reason 1

    ... -> configuration continues after see time table above.

    You can see here that the server node gets the aop 0x8008 too, but much more than two times. The number of aop 0x8008 (get_composition_data) messages increases after deleting the node again from the mesh network.

    What can cause this issue/behaivor?

  • I'm not sure why this happens, I will need more time to figure out what is causing this behavior. Do you see the same if you use the nRF Mesh app(Android/iOS) to provision/configure and reset the nodes?

  • About increasing configuration time:
    I think you have misunderstood the point. The point is, you should not use "the same address" (i.e. address: 500) while re-provisioning a node. You must use different addresses while re-provisioning a node.

    About assert:

    JeffZ said:
    Looks like the publish for my address/address handle still exists even though I deleted it before rebooting after unprovisioning the server node. 

    Yes, this happens because you have removed the publish address from the DSM, but it has not reset the model publication settings. Therefore, model publication settings still have an address handle information that no-longer exist.

    So after doing:

      publish_addr_handle = DSM_HANDLE_INVALID;
      LIBBTD_ERROR_CHECK(access_model_publish_address_get(model_handle, &publish_addr_handle));
      LIBBTD_ERROR_CHECK(dsm_address_publish_remove(publish_addr_handle));

    You should use `access_model_publish_address_set()` to set the new publish address handle. If the model publication is configured once, it can't be disabled ( this is the shortcoming of the current `access_model_publish_address_set()` API), so, once the publication is configured, the publish address handle must be valid and should exist in the DSM.

    Unless a new publish address handle is available and configured in the model, the old one should not be deleted.

  • Mittrinh, thanks for your response.

    I understood you well, but I am searching a way to re-use addresses. Just using a new address is not acceptable. I dont want to restart the device (provisioner/client) each time after reaching 'REPLAY_CACHE_ENTRIES'.

    Would this be possible?:

    1. First provisioning of fresh server node with address 500.
    2. Setting handles (lets call it 500.handles) and configuration models (heath, small message, etc.) for node 500
    3. After a while deleting node 500.
    4. Re-provisioning node with 501
    5. Updating 500.handles with new address 501 (500.handles -> 501.handles)
    6. Reusing of address 500 for next node

    If this way is possible, what would be the best practice to do so?
    How I can free address 500 for next nodes?

    Is using always a new address (uint16_t) and/or restarting the provisioner/client after reaching REPLAY_CACHE_ENTRIES your final answer?

    Thanks,
    Jeff

  • Today I tried the following:

    1. Provision fresh node 1 with address 500 and configuration models

    2. Delete node 1 (500) from mesh (like I wrote above)

    3. Reprovision node 1 with address 501. Provisioning and cofiguration time < 3 seconds (very good)

    4. Delete node 1 (501) from mesh 

    5. Reprovision node 1 again with address 500 Provisioning and configuration time > 10 seconds

    4. Delete node 1 (500) from mesh

    5. Reprovision node 1 again with address 501 Provisioning and configuration time > 20 seconds

    and so on...

    I can never reuse the address.

  • Hi,

    Like you already noticed, the method suggested would still have the problem of increasing configuration time.

    From our developer:

    Does he want to create a qualified Bluetooth Mesh product?

    • If yes, according to Mesh Specification section 3.10.7 he cannot reuse addresses until IV index procedure completes in the network. The use case he is describing does not make sense from the perspective of the Mesh Profile Specification.
    • If not, he can reuse the addresses by calling `replay_cache_clear()` API to clear the replay list on the provisioner without restarting. However, he may still have issues when re-provisioned node tries to communicate with other nodes around it (because replay protection on those nodes will start filtering packets with old sequence numbers). Clearing of the replay list creates a possibility of replay attacks.
  • Hi, thanks for response.

    Maybe I see light at the end of the tunnel...

    • replay_cache_clear() helps and works, but I have to wait about 20-30 seconds before re-provision the same node with the same address without increasing config time 
    • re-starting the provisioner/client device helps too

    What would you suggest as a better way - restart or replay_cache_clear()?
    Will restart solve the problem of old sequence numbers with other nodes around?

    Why does the use case makes no sense???
    Connect, disconnect and reconnect (with same parameters) is basic in communications.

Reply
  • Hi, thanks for response.

    Maybe I see light at the end of the tunnel...

    • replay_cache_clear() helps and works, but I have to wait about 20-30 seconds before re-provision the same node with the same address without increasing config time 
    • re-starting the provisioner/client device helps too

    What would you suggest as a better way - restart or replay_cache_clear()?
    Will restart solve the problem of old sequence numbers with other nodes around?

    Why does the use case makes no sense???
    Connect, disconnect and reconnect (with same parameters) is basic in communications.

Children
  • Hi,

    The problem of still having to wait for 20-30 seconds after use of replay_cache_clear():
    This is probably due to the network cache filter. Try also calling `msg_cache_clear()` after `replay_cache_clear()`. This should help getting rid of extra wait time.

    Why use case does not make sense?
    As mentioned earlier, the scenario of using the same address for the node during the re-provisioning process is not allowed by the Mesh Specification unless the network goes through the IV update cycle. Resetting a replay list is a workaround to circumvent the issue. The Mesh specification mandates that the node should always filter out the packets that have sequence numbers lesser than or equal to the most recently seen sequence number from a that SRC address. 

Related