This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Increasing leader re-allocation interval Thread

Hi All,

I'm working from the Thread CLI example, and I got a network of multiple thread devices that works just fine (they see each other, and can send/receive messages etc).

When I start the first node, it becomes the leader like it is supposed to, and any node started thereafter joins the network without becoming a leader. So far so good.

However, when I then turn off the leader, it often takes at least tens of seconds (up to more than a minute) for a new leader to be selected among the remaining nodes. I understand that this might be a deliberate implementational choice. However, since I would like the use the leader role to synchronize some specific tasks I have for the nodes I always need exactly one leader present. Furthermore, my thread devices are portable so a leader getting lost is definitely possible in my use case.

Is it possible to increase the rate at which the network will elect a new leader once a leader drops out?
I have searched the documentation for this, but I can't find any information on this.

Thanks!

Parents
  • Hello,

    Initially, I should say that Openthread (or Zigbee for that matter) is not ideal for moving nodes. The reason for this is that the network is self maintained, but this is kind of slow (as you have already experienced). These networks are routed messages, meaning that all the messages to a node goes through the parent of the node. If the nodes keep moving and dropping out, the network will constantly try to recover itself, establishing new parent-child connections, selecting new leaders, and doing all this using non-optimal (not working) network routes. When this happens you may have several partitions of your network that can't communicate with the other partitions.

    I suggest that you at least consider Bluetooth Mesh, which is a flooding network. In Bluetooth Mesh the nodes have no sense of the distance between the nodes, or their positioning, so you don't get these kind of issues. 

    That being said. The reason for this taking a long time in Openthread is that it takes time to discover that the leader is "dead". The Nodes doesn't send "Im-alive" messages several times per second, and they want to give the leader a chance of returning before they decide that it is lost (to avoid having to select a new leader, and hence do potentially large changes to the network hierarchy/structure). 

    You can see what may happen to a network if nodes keep falling out here:

    https://openthread.io/guides/thread-primer/node-roles-and-types

    That being said, it is possible to change this, using the API found in thread_ftd.h:

    /**
     * Set the ROUTER_DOWNGRADE_THRESHOLD parameter used in the Leader role.
     *
     * @note This API is reserved for testing and demo purposes only. Changing settings with
     * this API will render a production application non-compliant with the Thread Specification.
     *
     * @param[in]  aInstance   A pointer to an OpenThread instance.
     * @param[in]  aThreshold  The ROUTER_DOWNGRADE_THRESHOLD value.
     *
     * @sa otThreadGetRouterDowngradeThreshold
     *
     */
    void otThreadSetRouterDowngradeThreshold(otInstance *aInstance, uint8_t aThreshold);
    

    Please see the note at the very bottom in this openthread guide

    "Within two minutes..."
    
    You may have noticed this mentioned a few times throughout the Codelab, when waiting for a device to change 
    states. 120 seconds is the default value of a Thread Network parameter called ROUTER_SELECTION_JITTER. 
    Devices changing from End Device to Router or vice versa delay the change for a random period between 0 and 
    the ROUTER_SELECTION_JITTER (in seconds). See the Thread Specification for more information on Thread 
    Network parameters.

    Best regards,

    Edvin

Reply
  • Hello,

    Initially, I should say that Openthread (or Zigbee for that matter) is not ideal for moving nodes. The reason for this is that the network is self maintained, but this is kind of slow (as you have already experienced). These networks are routed messages, meaning that all the messages to a node goes through the parent of the node. If the nodes keep moving and dropping out, the network will constantly try to recover itself, establishing new parent-child connections, selecting new leaders, and doing all this using non-optimal (not working) network routes. When this happens you may have several partitions of your network that can't communicate with the other partitions.

    I suggest that you at least consider Bluetooth Mesh, which is a flooding network. In Bluetooth Mesh the nodes have no sense of the distance between the nodes, or their positioning, so you don't get these kind of issues. 

    That being said. The reason for this taking a long time in Openthread is that it takes time to discover that the leader is "dead". The Nodes doesn't send "Im-alive" messages several times per second, and they want to give the leader a chance of returning before they decide that it is lost (to avoid having to select a new leader, and hence do potentially large changes to the network hierarchy/structure). 

    You can see what may happen to a network if nodes keep falling out here:

    https://openthread.io/guides/thread-primer/node-roles-and-types

    That being said, it is possible to change this, using the API found in thread_ftd.h:

    /**
     * Set the ROUTER_DOWNGRADE_THRESHOLD parameter used in the Leader role.
     *
     * @note This API is reserved for testing and demo purposes only. Changing settings with
     * this API will render a production application non-compliant with the Thread Specification.
     *
     * @param[in]  aInstance   A pointer to an OpenThread instance.
     * @param[in]  aThreshold  The ROUTER_DOWNGRADE_THRESHOLD value.
     *
     * @sa otThreadGetRouterDowngradeThreshold
     *
     */
    void otThreadSetRouterDowngradeThreshold(otInstance *aInstance, uint8_t aThreshold);
    

    Please see the note at the very bottom in this openthread guide

    "Within two minutes..."
    
    You may have noticed this mentioned a few times throughout the Codelab, when waiting for a device to change 
    states. 120 seconds is the default value of a Thread Network parameter called ROUTER_SELECTION_JITTER. 
    Devices changing from End Device to Router or vice versa delay the change for a random period between 0 and 
    the ROUTER_SELECTION_JITTER (in seconds). See the Thread Specification for more information on Thread 
    Network parameters.

    Best regards,

    Edvin

Children
No Data
Related