Experience and configuration for large thread network using nrf52840

I'm planning to deploy a Thread network with 200-400 units and seeking best practices and real-world experience from the community. While the Thread spec allows up to 32 routers with 512 children each, I haven't found detailed guidance on large-scale configurations.

Current Findings:

  • I've tested with ~70 nrf52840 dongles and identified the need for startup jitter to prevent router overload during bulk device connections
  • Questions: What jitter values work best for large networks or how to calculate an good? What's the recommended wait time for router promotion?

Additional Concerns:

  • Optimal role configurations when physical coverage requires many routers
  • Downstream performance with dynamic child IDs, router changes, and sleeping devices
Parents
  • Hi Alexander, 

    We don't have such data that the startup jitter is used as required by the specification.  

    The reboot time in a Thread network consisting of that many devices is known to be long. However, the Thread community works on making it faster and more reliable.

    I'm afraid we don't have any research on this topic to share.

    Regards,
    Amanda H.

  • Thank you for the initial response. We've conducted internal testing (70+ device scale) and would like to refine our questions based on preliminary findings. We're hoping to better understand the nrf52840's practical capabilities and known constraints when deployed at larger scales.

    Our Testing Context:

    We've tested startup patterns, router promotion timing, and child attachment scaling
    We're designing for 200-400 devices across segmented Thread networks
    We're seeing both successes and specific failure modes that we'd like to address

    Router Child Capacity & CPU Pressure:

    We observe CPU and buffer pressure when a single router handles burst attach storms. Are there published specifications or recommended limits for:
    Maximum concurrent child attachment rate per router?
    Recommended child count per router for sustained stability (vs. theoretical 512)?
    Buffer/queue sizing recommendations during bulk device joins?

    Gateway/Coordinator Performance at Scale:

    What are the known limitations of ot-br-posix (or nrf gateway solutions) when managing multiple datasets/segments with 100+ devices per segment?
    Any known issues with downstream address resolution or device discovery after parent changes?

    Configuration Tuning for Large Networks:

    Are there undocumented or application-note-only parameters we should be tuning for larger deployments? (e.g., CHILD_TIMEOUT, ROUTER_UPGRADE_THRESHOLD, route refresh intervals)
    What jitter and startup delay values does Nordic recommend in practice?

    Known Limitations & Roadmap:

    Are there known issues with sleepy end-device (SED) downstream behavior, address stability, or topology flapping that we should architect around?
    Can Nordic share any internal scaling test data or reference architectures?
    What would help most:

    Configuration examples or application notes for deployments >100 devices
    Honest acknowledgment of known limitations so we can design appropriately
    Recommendations on segmentation thresholds or failover strategies

Reply
  • Thank you for the initial response. We've conducted internal testing (70+ device scale) and would like to refine our questions based on preliminary findings. We're hoping to better understand the nrf52840's practical capabilities and known constraints when deployed at larger scales.

    Our Testing Context:

    We've tested startup patterns, router promotion timing, and child attachment scaling
    We're designing for 200-400 devices across segmented Thread networks
    We're seeing both successes and specific failure modes that we'd like to address

    Router Child Capacity & CPU Pressure:

    We observe CPU and buffer pressure when a single router handles burst attach storms. Are there published specifications or recommended limits for:
    Maximum concurrent child attachment rate per router?
    Recommended child count per router for sustained stability (vs. theoretical 512)?
    Buffer/queue sizing recommendations during bulk device joins?

    Gateway/Coordinator Performance at Scale:

    What are the known limitations of ot-br-posix (or nrf gateway solutions) when managing multiple datasets/segments with 100+ devices per segment?
    Any known issues with downstream address resolution or device discovery after parent changes?

    Configuration Tuning for Large Networks:

    Are there undocumented or application-note-only parameters we should be tuning for larger deployments? (e.g., CHILD_TIMEOUT, ROUTER_UPGRADE_THRESHOLD, route refresh intervals)
    What jitter and startup delay values does Nordic recommend in practice?

    Known Limitations & Roadmap:

    Are there known issues with sleepy end-device (SED) downstream behavior, address stability, or topology flapping that we should architect around?
    Can Nordic share any internal scaling test data or reference architectures?
    What would help most:

    Configuration examples or application notes for deployments >100 devices
    Honest acknowledgment of known limitations so we can design appropriately
    Recommendations on segmentation thresholds or failover strategies

Children
  • Alex0123456 said:
    Are there published specifications or recommended limits for:
    Maximum concurrent child attachment rate per router?
    Recommended child count per router for sustained stability (vs. theoretical 512)?
    Buffer/queue sizing recommendations during bulk device joins?

    In the nRF Connect SDK, the OpenThread samples use prebuilt OpenThread libraries where the maximum number of children is defined to be 32 and are certified. We don't have those data as your require. The only buffer-related configuration documented in the available NCS sources is the message pool: Message buffer size and number of message buffers in the pool can be configured with the CONFIG_OPENTHREAD_MESSAGE_BUFFER_SIZE and CONFIG_OPENTHREAD_NUM_MESSAGE_BUFFERS Kconfig options, respectively. By default, the message buffer size is set to 128, and the number of message buffers is set to 96 for a Full Thread Device and 64 for a Minimal Thread Device. Also check the OpenThread memory requirements and this post

    Alex0123456 said:
    What are the known limitations of ot-br-posix (or nrf gateway solutions) when managing multiple datasets/segments with 100+ devices per segment?
    Any known issues with downstream address resolution or device discovery after parent changes?

    The nRF Connect SDK does not provide a complete Thread Border Router solution. For development, we recommends using the OpenThread Border Router (OTBR) released by Google, which is compatible with Nordic devices. See Thread tools. A typical Border Router consists of an NCP/RCP application (e.g., on an nRF52840) paired with a Linux-based host. Thread supports multiple border routers operating simultaneously, providing redundant paths into and out of a network to ensure resilience. See Thread overview

    Please consult the OpenThread GitHub issues and documentation directly for ot-br-posix-specific limitations.

    Alex0123456 said:

    Configuration Tuning for Large Networks:

    Are there undocumented or application-note-only parameters we should be tuning for larger deployments? (e.g., CHILD_TIMEOUT, ROUTER_UPGRADE_THRESHOLD, route refresh intervals)
    What jitter and startup delay values does Nordic recommend in practice?

    The following configurable parameters relevant to large network tuning:

    Child timeout / supervision (set these greater than the SED poll period to avoid unintended wakeups):

    • CONFIG_OPENTHREAD_MLE_CHILD_TIMEOUT
    • CONFIG_OPENTHREAD_CHILD_SUPERVISION_CHECK_TIMEOUT
    • CONFIG_OPENTHREAD_CHILD_SUPERVISION_INTERVAL

    See Child timeouts config

    Message pool (potentially relevant to buffer pressure during bulk joins):

    • CONFIG_OPENTHREAD_MESSAGE_BUFFER_SIZE (default: 128)
    • CONFIG_OPENTHREAD_NUM_MESSAGE_BUFFERS (default: 96 for FTD, 64 for MTD)

    See Message pool config

    Additional configuration options see Configuring Thread in the nRF Connect SDK

    Alex0123456 said:
    Are there known issues with sleepy end-device (SED) downstream behavior, address stability, or topology flapping that we should architect around?
    Can Nordic share any internal scaling test data or reference architectures?

    You can check the Known issues page for your NCS version. 

Related