We are using mesh SDK v3.2.0, nRF SDK 15.3, nRF52840 DK and SES as our IDE.
We had a mesh network with around 5 devices and after a while 2 of them just stopped sending data to the network, while being perfectly able to receive from it.
After debugging one of those devices, we found that the condition m_net_state.seqnum < m_net_state.seqnum_max_available on net_state_seqnum_alloc function inside net_state.c was not being verified, leading to NRF_ERROR_FORBIDDEN being returned from net_state_seqnum_alloc silently.
After searching around the devzone, we found something called the iv_index and iv_update procedure. Basically, the iv_index extends the sequence number of the mesh messages leaving the sender devices, so that the sequence number doesn't overflow. Given some conditions, this iv_index is incremented on the whole mesh network from any of its devices if they trigger an iv_update procedure, after which the sequence number is returned to 0.
From here, we also understood that sequence numbers are only stored on the flash of each device every 8192 increments, and each time the device power cycles, it "jumps" 8192 sequence numbers in front of the stored one. This makes sense, as to not wear the flash too much.
This leads to an easy to understand problem, that is power cycling devices frequently exhausts the maximum allowed sequence numbers very fast and we think that is what happened over 3 or 4 days (happened a month ago) with the failing devices. We needed to power cycle them once every couple minutes during work hours (we have since changed that) so around ~1000 times maybe.
We understand the importance of the iv_index, what we don't understand is why the device that sees it's sequence number approaching its limit does not trigger an iv_update procedure and avoids this problem of reaching the maximum allowed sequence number right from the bat after rebooting. For me at least it does not make any sense that a signal level device consuming less than 10 watts or something has software problems from rebooting too frequently. I feel like we're missing something or we understood something wrong from reading posts on this forum and the Bluetooth documentation. Shouldn't an iv_update be issued if the device reboots and loads a high enough sequence number from flash memory?
Thank you,
Rúben Marques