BT Mesh IV Update Parameters / Timers

We have five nRF52840 DevKits (Mesh SDK 5.0.0) running to collect data. The devices don't have a stable power supply and regularly turn on and off. (We are only interested in the data when the supply is there)

In general the setup works well. However after some seemingly random time of a few days/weeks one or multiple sensors randomly stop transmitting. Then it takes a few hours up to a few days until the nodes by themselves are able to transmit again. To me, that sounds like the IV update is not working as required.

The nodes should be able to initiate an IV update whenever they're close to exhausting their sequence numbers.

What we have modified to allow updates while ignoring the Mesh Spec times:

  • We increased the flash block size number to reduce strain on the flash:

//mesh_config_core.h:
#define NETWORK_SEQNUM_FLASH_BLOCK_SIZE 65536      //was 8192ul

A node transmits its data at 10Hz. So the time until the SEQNUM block is full is 65536/10Hz = 6553seconds = ~2h OR when the device reboots.

  • We modified the IV update start threshold to a higher value so we will increment IV less frequently. In the mesh profile spec "3.10.5 IV Update procedure" it states that an IV update must be triggered at least 96h before a node runs out of sequence numbers. Here it's 975000/10Hz = 27h assuming the node stays powered all the time. What happens when a node runs out of sequence numbers when it has not yet updated the IV?

//nrf_mesh_config_core.h
#define NETWORK_SEQNUM_IV_UPDATE_START_THRESHOLD NETWORK_SEQNUM_MAX - 975000 //was (NETWORK_SEQNUM_MAX / 2)

  • We overwrote the intervals specified in the net_state.c. Because our devices typically only stay on for maximum 12h at a time, we never reach the required 144h/192h timer values. Thus we reduced the maximum IV update interval to 1 minute and the recovery interval to 5.

//net_state.c
#define NETWORK_MAX_IV_UPDATE_INTERVAL_MINUTES   (1)  //was (144*60), 144h 
#define NETWORK_MIN_IV_RECOVERY_INTERVAL_MINUTES (5)  //was (192*60), 192h 

Are those timer values stored in flash at some stage or do they restart from 0 after a reboot?

Edit: I just found out that the timer values are somehow stored in net_state.c defined by

#define IV_UPDATE_TIMEOUT_PERIODIC_SAVE_MINUTES (30)

What exactly is the IV update timeout?

Could you please help me understand the 144h limit described in the mesh profile specification "3.10.5 IV Update procedure":

"After at least 96 hours and before 144 hours of operating in IV Update in Progress state, the node shall transition back to the IV Normal Operation state and not change the IV Index."

We ignore the MAX_IV_UPDATE_INTERVAL_MINUTES value by allowing an update after 1minute of being powered on.

What would be an appropriate setting for the NETWORK_MIN_IV_RECOVERY_INTERVAL_MINUTES?

Parents
  • Hey Domij! Sorry about the delay.

    Could you tell me more about this project? Are there more nodes in the network? With the nodes being offline that much it might be that mesh isn't the best solution for you.

    In general the setup works well. However after some seemingly random time of a few days/weeks one or multiple sensors randomly stop transmitting. Then it takes a few hours up to a few days until the nodes by themselves are able to transmit again. To me, that sounds like the IV update is not working as required.

    In what way would you say it isn't working as required? An IV update can take between 96-144 hours, and I would assume closer to 144 if the nodes are turning off and on frequently. So to me it seems that it might work the way it is meant to. 

    • We increased the flash block size number to reduce strain on the flash:

    In general that might postpone the issue, but also push it forward (and in your case extremely much so). Increasing the block size is a double edged sword, in that a higher value will give you a longer time until you would need to write to flash, though a bigger jump on reboot. On reboot you wouldn't know where to start the sequence number after all, so you start on the next block. Which is presumably what your nodes are doing on every reboot. This is a problem in and of itself, though increasing the block size will make that jump even longer and the need for an IV update even faster. 

    What happens when a node runs out of sequence numbers when it has not yet updated the IV?

    I am little bit uncertain about what concretely happens in our stack, though the node won't be able to send messages until the IVI Update is finished.

    We overwrote the intervals specified in the net_state.c. Because our devices typically only stay on for maximum 12h at a time, we never reach the required 144h/192h timer values.

    Are those timer values stored in flash at some stage or do they restart from 0 after a reboot?

    I don't think they are stored in flash. If they aren't then this sounds like a problem. I will have to get back to you on that. Though I should mention that when you change these definitions, besides what is in the application, you also move away from the Bluetooth spec.

    What exactly is the IV update timeout?

    This seems to be a way that the IV update progress is being stored, so this timer is being stored atleast. 

    Could you please help me understand the 144h limit described in the mesh profile specification "3.10.5 IV Update procedure":

    It is the maximum time for it be in IV Update state. I think they justified it as being 96*1.5, not to match a specific number but so that it statistically turns out to work well. 

    What would be an appropriate setting for the NETWORK_MIN_IV_RECOVERY_INTERVAL_MINUTES?

    The standard 30minutes sounds good, though for your situation I'd understand if you would want it more often.

    Best regards,

    Elfving

Reply
  • Hey Domij! Sorry about the delay.

    Could you tell me more about this project? Are there more nodes in the network? With the nodes being offline that much it might be that mesh isn't the best solution for you.

    In general the setup works well. However after some seemingly random time of a few days/weeks one or multiple sensors randomly stop transmitting. Then it takes a few hours up to a few days until the nodes by themselves are able to transmit again. To me, that sounds like the IV update is not working as required.

    In what way would you say it isn't working as required? An IV update can take between 96-144 hours, and I would assume closer to 144 if the nodes are turning off and on frequently. So to me it seems that it might work the way it is meant to. 

    • We increased the flash block size number to reduce strain on the flash:

    In general that might postpone the issue, but also push it forward (and in your case extremely much so). Increasing the block size is a double edged sword, in that a higher value will give you a longer time until you would need to write to flash, though a bigger jump on reboot. On reboot you wouldn't know where to start the sequence number after all, so you start on the next block. Which is presumably what your nodes are doing on every reboot. This is a problem in and of itself, though increasing the block size will make that jump even longer and the need for an IV update even faster. 

    What happens when a node runs out of sequence numbers when it has not yet updated the IV?

    I am little bit uncertain about what concretely happens in our stack, though the node won't be able to send messages until the IVI Update is finished.

    We overwrote the intervals specified in the net_state.c. Because our devices typically only stay on for maximum 12h at a time, we never reach the required 144h/192h timer values.

    Are those timer values stored in flash at some stage or do they restart from 0 after a reboot?

    I don't think they are stored in flash. If they aren't then this sounds like a problem. I will have to get back to you on that. Though I should mention that when you change these definitions, besides what is in the application, you also move away from the Bluetooth spec.

    What exactly is the IV update timeout?

    This seems to be a way that the IV update progress is being stored, so this timer is being stored atleast. 

    Could you please help me understand the 144h limit described in the mesh profile specification "3.10.5 IV Update procedure":

    It is the maximum time for it be in IV Update state. I think they justified it as being 96*1.5, not to match a specific number but so that it statistically turns out to work well. 

    What would be an appropriate setting for the NETWORK_MIN_IV_RECOVERY_INTERVAL_MINUTES?

    The standard 30minutes sounds good, though for your situation I'd understand if you would want it more often.

    Best regards,

    Elfving

Children
Related