BLE Mesh nodes losing provisioning data — possible causes?

Hello everyone,

We hope this message finds you well. We'd really appreciate some guidance on an issue we've been encountering in the field.

We have deployed several hundred light nodes based on the NRF52833 in a BLE Mesh configuration. The firmware is substantially identical to the light example provided with NCS 2.7.0.

A small number of nodes have been returned from the field and appear to have lost their provisioning data. Since no one outside of our team has the ability to reset or remove these nodes, we're keen to understand what might have caused this.

One avenue we're currently exploring is whether this could be related to the NVS partition being too small — it's currently set to 32 KB. Could an undersized NVS lead to provisioning data loss?

We'd also be very grateful to hear if there are any other known reasons a node might lose its provisioning data.

Thank you very much for your time and any insights you can share.

Warm regards,
Elementi Tech SW Team

Parents
  • Hi,

    A small number of nodes have been returned from the field and appear to have lost their provisioning data.

    In order to figure out what might have happened with the nodes, I am afraid we do need some more information. How do you conclude that the provisioning data is lost, is it based off of log messages, node behavior, reading back the contents of Flash, etc? In other words, what is the state of the returned nodes, before you reset them? E.g. are the nodes in an empty, factory reset state, ready to be provisioned into a new network, or are they in a corrupted state where they do have some provisioning data (but it doesn't match that of the network), or are they in a corrupted state which gets detected by the application/stack, etc., etc. Are you able to read out the flash contents, to compare with correctly provisioned devices?

    Please note that nodes may permanently lose contact with the network if they are away from the network for a prolonged period of time (missing several IV Update procedures), although that should not happen until at least 48 weeks have passed.

    Nodes may also lose contact with the network if performing a node reset, and/or if deleting the configuration database (CDB), but that would either have to be deliberately called from the application, or requires that the node reset is triggered through the configuration server (typically by the provisioner).

    Nodes can also stop receiving messages from new nodes on the network, if their replay protection list (CONFIG_BT_MESH_CRPL) is too small. Other nodes on the network may start ignoring the node, if the sequence numbers sent by the node starts from a previous value, or if the replay protection lists of those other nodes do not have space for an entry for the node.

    Regards,
    Terje

Reply
  • Hi,

    A small number of nodes have been returned from the field and appear to have lost their provisioning data.

    In order to figure out what might have happened with the nodes, I am afraid we do need some more information. How do you conclude that the provisioning data is lost, is it based off of log messages, node behavior, reading back the contents of Flash, etc? In other words, what is the state of the returned nodes, before you reset them? E.g. are the nodes in an empty, factory reset state, ready to be provisioned into a new network, or are they in a corrupted state where they do have some provisioning data (but it doesn't match that of the network), or are they in a corrupted state which gets detected by the application/stack, etc., etc. Are you able to read out the flash contents, to compare with correctly provisioned devices?

    Please note that nodes may permanently lose contact with the network if they are away from the network for a prolonged period of time (missing several IV Update procedures), although that should not happen until at least 48 weeks have passed.

    Nodes may also lose contact with the network if performing a node reset, and/or if deleting the configuration database (CDB), but that would either have to be deliberately called from the application, or requires that the node reset is triggered through the configuration server (typically by the provisioner).

    Nodes can also stop receiving messages from new nodes on the network, if their replay protection list (CONFIG_BT_MESH_CRPL) is too small. Other nodes on the network may start ignoring the node, if the sequence numbers sent by the node starts from a previous value, or if the replay protection lists of those other nodes do not have space for an entry for the node.

    Regards,
    Terje

Children
No Data
Related