I have a critical problem about mesh network.
After running 2 days, some device in mesh network cannot publish message with error 15 ( forbidden ).
I tried to search in devzone, I found some people has the same problem with me.
I think iv index update produce or sequence number has problems.
But i only send 1 ping message per hour, and sometimes response keep alive message from gateways.
I cannot figure with that problems happen.
How can i recover device when publishcation forbidden occurs?
I noticed this from an earlier post:
huybk213 said:When forbidden error occurred, i am restart device but it never publishcation success again.
If reset is done many times, either manually or automatically…
The sequence number is on the message as sent from the originator. The number of relays through the network is not related to sequence number, as the sequence number stays the same when relayed.
Could you check if the problem occurs with SDK v4.2?
Are you rebooting your devices frequently?
Hi i will check with SDK v4.2,In my code, i only reset device when the assertion occurs.Last week, other workers put device into customer site, they did some test (for example trigger a ping request, press a button to send a mesh message).May be in setup process, they reboot, unplug solar battery..etc, but it's happen very little times.Thanks,
If reset is done many times, either manually or automatically on error, then it can quickly lead to issue with the sequence number getting exhausted. This is because for every reset the sequence number does a "jump" upwards. If resetting multiple times in a row you will then quickly run out of new numbers. By default, the jump is set to 8192, but this can be changed by overriding the NETWORK_SEQNUM_FLASH_BLOCK_SIZE definition from nrf_mesh_config_core.h. Note that there is a tradeoff between running out of sequence numbers and wearing out flash, and the "correct" way to solve related issues is usually not to change the value. Usually the solution is to reset less often.
So why this "jump"?
Because the only way to store the "last sequence number used" persistently through a reset, is to store to flash. But storing the latest sequence number to flash every time means one flash write for every packet sent, and that would quickly wear out flash. So instead the "real" sequence number is held in RAM while a whole series is "reserved" by writing a bigger number to Flash. That way, on reset, it is safe to start from the number in Flash, at the expense of huge increase in sequence number used. And when the "real" number in RAM gets close to the number stored in Flash, a new (higher) number is written to Flash again, so that the number in Flash is always a safe starting point after reset.
In any case, it means that after x number of resets, the device cannot send data on the mesh network until the next IV update.
Dear Tesc,Thank you for your answer, i will review my code again and think about your suggestion.Does seq number increase every relay message? For example if I set TTL is 60, sequence number will increase every time TTL decrease until 0?If sequence number reach maximum, how can i recover device?Because the Mesh device in customer size, I cannot provision device again.May be device must wait until next IV update, how can i manually trigger IV index update for network?Thank you in advance!
By the way, did you experience the same issues with nRF5 SDK for Mesh v4.2?