BLE Mesh: adding and removing nodes.

Hi,

I have a BLE Mesh network managed by an nRF52840 device (let's call it the central device). I can add and remove other nRF52840 nodes to and from the network (kind of) successfully. I say “kind of” because of the following issues:

1) If I keep adding and removing nodes, the RPL list on the central device eventually gets full. I found that the proper way to clear the list is by performing an “IV Update.” How can I trigger or force an IV Update?

2) If I turn off a node and then make the central device remove it from the network, when that node is turned back on, it doesn’t “know” it has been removed. When I try to re-provision it, the central device starts behaving erratically by re-provisioning nodes already in the network or failing to find the UUID of the node to be provisioned (it seems to search for a UUID that was never provided).

Also, the node that comes back online seems to still communicate with the central device, corrupting or interfering with its configuration database.

What is the correct procedure to remove nodes in this case, and how can a node be added back if it comes online again?

Best regards,
P

Parents
  • Hi Pablo, 
    Could you define how you remove the node from the network ? 
    As far as I know there isn't a simple remove node feature in Bluetooth mesh. 
    You will need to send a  Config Node Reset to the target node, remove it from the configuration database. But this doesn't guarantee that the node will reset it self and won't communicate to other mesh nodes. 
    To be able to verify that the node doesn't allow to communicate to the mesh network, you need to do a Key Refresh procedure (blacklist the node). This mean you need to send all other nodes in the network a new network key and application keys. The blacklist node(s)will not receive this new keys. 
    This is a not simple process, so you will have to find the best way of doing this. 


    Could you describe a bit more about your application ? Why you want to add and remove node continuously ? Can the node just being reset by a button and add as a new node without removing ? You can simply remove the node address in the RPL list if it's full. 

  • Hi Hung,

    To remove a node we follow the steps as described here.

    We send the Config Node Reset command to the target node and remove it from the configuration database.
    The part we do not perform is the Key Refresh procedure, since this would not be feasible for us. We plan to eventually manage thousands of units, making a network-wide key refresh process impractical.
    Is there any way to properly remove a node without performing a Key Refresh?

    We don’t intend to add and remove nodes continuously, but we need a reliable workaround for cases when a node appears to be offline and must be removed from the network as cleanly as possible.

    We cannot reset the node using a physical button, but we can perform a DFU OTA update that erases its configuration memory when the node is powered back on. Is this a correct workaround to reset a node?

    Finally, how can we remove a specific node from the RPL list? We couldn’t find a function that allows this for an individual entry.
    These are the RPL-related functions we found:

    typedef void (*bt_mesh_rpl_func_t)(struct bt_mesh_rpl *rpl,
                        void *user_data);

    void bt_mesh_rpl_reset(void);
    bool bt_mesh_rpl_check(struct bt_mesh_net_rx *rx, struct bt_mesh_rpl **match, bool bridge);
    void bt_mesh_rpl_clear(void);
    void bt_mesh_rpl_update(struct bt_mesh_rpl *rpl,
                struct bt_mesh_net_rx *rx);
    void bt_mesh_rpl_pending_store_all_nodes(void);

  • Hi Pablo, 


    It doesn't seem there is a function to erase a single entry in RPL. So you are right that it's expected to be cleared only when doing IV Index update. I assume it's because of security reason, you don't want the RPL to be easy to be modified. 


    The question here is do you really need to remove and add a node back when it's turned off ?Do you have RAM constraint that you can't set the size of the RPL to the size of the network ? Each entry of RPL can take 36-48 bytes depends on the storage library. You may want to consider using a chip with larger RAM size (for example nRF54LM20A with 512kB of RAM). Note that it's mainly the gateway which need to receive packets from all other nodes. Most normal nodes wouldn't communicate with all other nodes in normal operation. 

    When a node sleeps for a long time and miss a IV Index update, it can recover the IV Index by the IV Index recovery process. 

    The process described in here is correct way of removing a node. However it can't guarantee that the node is excluded from the network and the key is removed if the node choose to not do so (or it's power off before it can do so). 

    Regarding your questions, I got the following reply from our Mesh team: 

    1. Is this proprietary use-case? Meaning, they are in full control of their eco-system and what devices are onboarded there.
    2. If yes, removing nodes from RPL list is fine under the assumption that they know what they are doing. We don't have any API for this, but since developer  is assumed to be expert they can add their own API into rpl.c (in nrf repo if they are using EMDS, or in zephyr repo if not using EMDS) to do this. The risk is replay attack from the nodes which are removed from RPL list and attacker knows what to do and how to utilize this opportunity.
    3. Alternative solution is to move network to IV index in progress state on 96th hour, and then move to normal operation after 192 hours, and keep repeating this IV update cycle. Once IV index is updated (after 196 hours), the current RPL list will be automatically reset. This requires that rate of decommissioning of the node and rate of new nodes getting provisioned is not faster than what can fit into RPL within 196 hour window. 
      1. IV index updates can be started by any node on the network in the possession of primary network key after operating into normal mode for at least 96 hours. How? (I have not tested this, but it should work, please try and let us know if something is not right).
        1. Enable CONFIG_BT_MESH_IV_UPDATE_TEST=y
        2. Then create timer in the application that counts 96 hours, upon timeout of that, call "bt_mesh_iv_update()" API
        3. Important: do not call  "bt_mesh_iv_update_test()" API. It will disable timing protections around IV update procedure and rouge node in the possession of primary network key (throgh trash can attack or something else) can trigger IV updates indiscriminately. Of course if you want to test the whole behavior with shorter intervals you can call this API and do on-desk testing.
        4. Use this mechanism with caution.
Reply
  • Hi Pablo, 


    It doesn't seem there is a function to erase a single entry in RPL. So you are right that it's expected to be cleared only when doing IV Index update. I assume it's because of security reason, you don't want the RPL to be easy to be modified. 


    The question here is do you really need to remove and add a node back when it's turned off ?Do you have RAM constraint that you can't set the size of the RPL to the size of the network ? Each entry of RPL can take 36-48 bytes depends on the storage library. You may want to consider using a chip with larger RAM size (for example nRF54LM20A with 512kB of RAM). Note that it's mainly the gateway which need to receive packets from all other nodes. Most normal nodes wouldn't communicate with all other nodes in normal operation. 

    When a node sleeps for a long time and miss a IV Index update, it can recover the IV Index by the IV Index recovery process. 

    The process described in here is correct way of removing a node. However it can't guarantee that the node is excluded from the network and the key is removed if the node choose to not do so (or it's power off before it can do so). 

    Regarding your questions, I got the following reply from our Mesh team: 

    1. Is this proprietary use-case? Meaning, they are in full control of their eco-system and what devices are onboarded there.
    2. If yes, removing nodes from RPL list is fine under the assumption that they know what they are doing. We don't have any API for this, but since developer  is assumed to be expert they can add their own API into rpl.c (in nrf repo if they are using EMDS, or in zephyr repo if not using EMDS) to do this. The risk is replay attack from the nodes which are removed from RPL list and attacker knows what to do and how to utilize this opportunity.
    3. Alternative solution is to move network to IV index in progress state on 96th hour, and then move to normal operation after 192 hours, and keep repeating this IV update cycle. Once IV index is updated (after 196 hours), the current RPL list will be automatically reset. This requires that rate of decommissioning of the node and rate of new nodes getting provisioned is not faster than what can fit into RPL within 196 hour window. 
      1. IV index updates can be started by any node on the network in the possession of primary network key after operating into normal mode for at least 96 hours. How? (I have not tested this, but it should work, please try and let us know if something is not right).
        1. Enable CONFIG_BT_MESH_IV_UPDATE_TEST=y
        2. Then create timer in the application that counts 96 hours, upon timeout of that, call "bt_mesh_iv_update()" API
        3. Important: do not call  "bt_mesh_iv_update_test()" API. It will disable timing protections around IV update procedure and rouge node in the possession of primary network key (throgh trash can attack or something else) can trigger IV updates indiscriminately. Of course if you want to test the whole behavior with shorter intervals you can call this API and do on-desk testing.
        4. Use this mechanism with caution.
Children
No Data
Related