IV Index and LPN devices that leave mesh

Hello,

We're developing an LPN product that will leave the mesh and stay off the mesh for up to 3 days (possibly more). When it rejoins the mesh, the central server/gateway could have some resets in between. 

Is there a chance the IV index will prevent the nodes from communicating once reconnected to the mesh? 

I am looking at this thread and wondering how much is safe to change:
https://devzone.nordicsemi.com/f/nordic-q-a/80558/bt-mesh-iv-update-parameters-timers

Thank you

Parents
  • Hi,

    IV Index update can only happen once every 192 hours (8 days). This means you should have no IV Index related issues if away from the network for 3 days. If the IV Index of the node is lagging too far behind for an IV Index update, the IV Index Recovery procedure will kick in, up to around 40 IV Updates behind. If further behind the node is unable to rejoin the network and must be reprovisioned.

    If you experience issues after a week away from the network, then there might be some issues with the IV Index Recovery procedure. After around 10 months you risk enough IV Index updates to have passed for the node to have completely lost the network.

    Regards,
    Terje

  • "If you experience issues after a week away from the network, then there might be some issues with the IV Index Recovery procedure."

    If this event happens what can be done to recover the IV index? Is there a method of manually triggering it? I've noticed some units stop talking all together to us after a couple of weeks separated but some messages do manage to come back through (mainly those sent by the LPN, not the other way). The only fix I've found is unprovisioning and reprovisioning once it starts up the advertisement. This likely won't be acceptable to our users so I'm hoping there's a safe way to send out the secure beacon, or increase the rate it goes out such that the scenario is less likely to happen. 

    From the link I posted they make a few suggestions about updating some defines.

    NETWORK_MIN_IV_RECOVERY_INTERVAL_MINUTES or even set it to zero for debugging purposes so that a node won't wait for timeout to run the IV Index Recovery procedure. (will this prevent said scenario from happening?)

    Our units are battery powered and have seasons where they're in use and not in use. If the whole mesh was powered off for up to 6 months at a time, would this pose any issue? I'm thinking we need a "storage mode" that would prevent the IV updates from happening. 

    I think I need to clarify the scenario a bit better:

    "IV Index update can only happen once every 192 hours (8 days). This means you should have no IV Index related issues if away from the network for 3 days."

    What could happen with our device is that it goes out of the mesh for 3 days, comes back, *maybe* talks back and forth with the server (sending stuff like battery level back), and then goes out of the mesh.

    A schedule could look like this (and why the 8 days is concerning)

    Day 1: Out of the mesh
    Day 3: Back on mesh briefly,leaves again
    Day 6: Back on mesh briefly,leaves again
    --Day 8: IV update happens--
    Day 9: Back on mesh but can no longer communciate as IV is now out of date


  • Hi,

    Alex Ross said:
    I am a little hesitant to switch to nRF Connect SDK because of the project development timeline.

    I have no issues understanding that rationale, especially if far in devlopment (and with devices out there already.) We are keeping some track of key values for the new SDK (memory requirements, power usage, etc.) although I do not currently have LPN power usage numbers at hand.

    Alex Ross said:
    I assume the phone app compatibility hasn't changed?

    Phone app is unchanged, as is the mesh specification, mesh models, etc. In fact nRF Connect SDK has slighlty better model coverage (and is the place where all new functionality will be developed.)

    Alex Ross said:
    Do you have any suggestions on how to create a scenario where the IV index is bad so I can see what's going wrong?

    For testing purposes you could lift the restriction of 192 hours between IV Index Updates, or maybe even hard code some IV Index "jump" to a new value, for instance set back the IVI for a node to emulate it being "behind".

    I am yet to hear back from the team, but will check with them and (hopefully) get back to you tomorrow.

    Please note Thursday is public holiday here in Norway, and many take Friday off as well for the long weekend.

    Regards,
    Terje

  • Hi,

    I noticed you had a second set of questions.

    The beacon is a collective effort of the network, where at any one place in the network topology one is expected to get, on average, one beacon every ten seconds. I.e. with two nodes, each node would send a beacon every 20 seconds, for a 10 second average. All nodes participate in this, as mandated by spec, and there is no setting controlling it.

    Alex Ross said:
    I'm in a situation now where I'm sending out a replacement unit to a client and I'm realizing that there's no real gaurentee that the two will talk now.

    It must be provisioned into the network, and as such "synced" with the IVI, yes. Does the system have provisioning support at the customer location, or is it "pre-provisioned" (so that provisioning is out-of-scope for the customer?)

    You will likely need some action from the application in order to recover the IVI. As mentioned in my previous reply, I will get back to you, hopefully before week-end (i.e. tomorrow.)

    Regards,
    Terje

  • Hi,

    I got some feedback from our mesh team.

    tesc said:
    If the LPN is not in a friendship, it will try to initiate one. If there has been IV Index updates, this will fail repeatedly, at which point the LPN should check Secure Network Beacons to see if an IV Index Recovery is needed. Some action may be needed from the application.

    The IV Index Recovery will be triggered if receiving a secure beacon with an IV Index higher than the current one. Beacons are sent through the network every 10 seconds (on average), which means there is some waiting time, and the node must be in a scanning state in order to receive the beacon. The recovery procedure itself is instantaneous (it updates a few variables in memory.)

    It does however depend on a couple of prerequisites: It can only happen after a timeout of 192 hours, and only if the node is not in the mimddle of an IVI Update. In order for the LPN to know that 192 hours has passed, it must have been on for that amount of time since the timer was last reset. Timer status is stored periodically, so the total "on time" might be divided in several sessions.

    tesc said:
    Similarly if the LPN is in a friendship, but has not polled through an IV Index Update, some action may be needed from the application.

    Staying in friendship beyond one IV Index update is not possible, since the maximum PollTimeout corresponds to 96 hours. In other words: In order to keep the friendship, one must have one successful poll at least once every IV Index. I did the calculations wrong when checking max PollTimeout for that previous answer, and found an erroneous max PollTimout that was longer than 96 hours. Correct number is 96 hours max.

    Regards,
    Terje

  • So I actually have some units now having the issue I've seen before.

    Essentially I have a switch that when triggered, the client sends a notification over mesh to the server. This works, I can see the message come in on the server. However, sending messages down to the client are being ignored.

    How can I be sure I'm sending out a secure beacon? Shouldn't the fact that I can receive a message from the client mean that the IV indexes are good? 

    The client is sealed and potted so there's not a lot of physical debugging I can do I'm afraid. (I really need to make a DFU backdoor) 

  • Hi,

    Alex Ross said:
    How can I be sure I'm sending out a secure beacon?

    I highly doubt there could be any issue there, but if you could sniff the network traffic (e.g. nRF Sniffer) or provision a DK into the network with a mesh applicaiton logging network activity.

    Alex Ross said:
    Essentially I have a switch that when triggered, the client sends a notification over mesh to the server. This works, I can see the message come in on the server. However, sending messages down to the client are being ignored.

    This sounds a bit weird. How are the messages addressed? Are the proper appkeys bound to the models and used for the publication? I agree, if messages go one way this is most likely not an IV Index issue.

    For confirmation of my assumptions, the LPN is what you refer to as the client, and the gateway is what you refer to as the server? What mesh models are you using?

    Regards,
    Terje

Reply
  • Hi,

    Alex Ross said:
    How can I be sure I'm sending out a secure beacon?

    I highly doubt there could be any issue there, but if you could sniff the network traffic (e.g. nRF Sniffer) or provision a DK into the network with a mesh applicaiton logging network activity.

    Alex Ross said:
    Essentially I have a switch that when triggered, the client sends a notification over mesh to the server. This works, I can see the message come in on the server. However, sending messages down to the client are being ignored.

    This sounds a bit weird. How are the messages addressed? Are the proper appkeys bound to the models and used for the publication? I agree, if messages go one way this is most likely not an IV Index issue.

    For confirmation of my assumptions, the LPN is what you refer to as the client, and the gateway is what you refer to as the server? What mesh models are you using?

    Regards,
    Terje

Children
  • "I highly doubt there could be any issue there, but if you could sniff the network traffic (e.g. nRF Sniffer) or provision a DK into the network with a mesh applicaiton logging network activity."

    I'll try both of these things. I can add another client and output what it sees. Can you tell me what function triggers the secure update so I can add a log output?


    "
    This sounds a bit weird. How are the messages addressed? Are the proper appkeys bound to the models and used for the publication? I agree, if messages go one way this is most likely not an IV Index issue.

    For confirmation of my assumptions, the LPN is what you refer to as the client, and the gateway is what you refer to as the server? What mesh models are you using?"

    Correct, client = LPN.

    In the nRF Mesh app I can refresh TTL, and I've verify they've subscribed to the correct groups and publish to the correct group. I even see it add itself as a friend to the server, however nothing I do seems to make it's way down to it. 

    I'll try out the nRF Sniffer and see what I can find. My other thought was to override the IV index to something higher and see if I can trigger an IV update in test mode that way.

    The model is based off of the generic on off but modified for an 8 byte message.

    It's worked pretty well for a while now, it just seems occasionally something really weird happens and the messages get rejected in some fashion. 



  • Okay so now I'm getting somewhere.

    I overrode a client's IV index to be 42 (within). This caused by server to update to IV index 42 as well, now I can see all of my nodes again (friendship succeeded, I see them all), however, even the client can't send a message to the server anymore. When they're both fresh, no problem, however once the IV update happens, the two are unable to send messages to each other anymore (I have verified with a log output message that the IV index is correct and matches.) 

    The other funny part is that once I have the IV index "hacked" I can no longer get the TTL information from the app either. That is to say:

    -Start with default IV on server, TTL can be loaded in nRF Mesh App
    -Provision, things look good, TTL refresh still works
    -"Hack" IV to match locked up friends
    -Now the server is friends with the locked up clients, however button presses aren't working
    -TTL loading doesn't work either
    -It seems as though it's stuck in NET_STATE_IV_UPDATE_IN_PROGRESS mode as well 


    Basically I need a way to hack the IV index in the most expected of ways such that the TTL refresh in the app still works. I also have the client in the same configuration, and again, I can't load the TTL. It almost seems as though an IV index of any non zero value and the app doesn't work with it. 

    Does the nRF Mesh app for iOS have it's own IV index?

    Any suggestions? 

  • I'm pretty convinced me screwing with the IV index is messing with the sequence number and now nothing gets through anywhere 

  • Hi,

    Alex Ross said:
    I overrode a client's IV index to be 42 (within). This caused by server to update to IV index 42 as well

    This would be expected behavior, and is explained by client sharing secure beacon which server receives, server noticing it is behind and therefore performing IVI Recovery.

    Alex Ross said:
    however, even the client can't send a message to the server anymore

    Where does it fail, is it in the sending, or that the server never seem to receive anything? What is the sequence number and IV Index bits of those packets?

    Do you have minimal examples for reproducing this on a DK?

    Based on your descriptions, it sounds like for instance sequence number not being reset could be an explanation, or if the replay protection is not working properly (not reset) when doing IVI Recovery. From what I recall, IV Index is not fully stored neither in packets sent nor in replay list, rather only the least significant bit is used (since the node knows what IV Indexes are possible for the current IV Index and IV Index update state.) If there's a bug somewhere, that sounds like a possible place.

    Alex Ross said:
    Does the nRF Mesh app for iOS have it's own IV index?

    I do not quite understand what youi mean by that. The nRF Mesh apps (both Android and iOS) follows the Bluetooth mesh specification, which means they should also be able to participate in IV Index Update and IV Index Recovery. I will double check if the phone apps do indeed have IVI Recovery functionality.

    You mention in a separate thread (nRF Mesh App stops workings after IV update) that you think this is a sequence number issue. I agree. Can you share the logs (or other information)I from your testing on this?

    Regards,
    Terje

Related