NCS Mesh: Relay function delays response messages

We are facing the problem that when relay feature is enabled on a mesh device (which is running nRF Connect SDK firmware), device response is delayed in a range of seconds.

To reproduce the problem we used light example project from NCS v1.8.0 and generated additional traffic using other device, which transmits unsegmented packets each 200ms. In between that traffic TTL packets are transmitted to which response gets delayed.

If relay feature is off, everything seems to work as expected.

  • urieder said:
    we have compared various NCS versions. The newer the version, the better the results, but even in the NCS v2.2.0 there are still some delays.

    Thank you for sharing the results for the different versions

    One more thing that got brought up when discussing your results just now was the Publish retransmit count (Typically set to 1 retransmit, that is each message contents is sent a total of two times, i.e. as two separate messages). It could be that a buffer containing outbound packets fills up due to a high retransmit number, causing longer and longer delay until the buffer is filled so you get constant delay (but see some packet loss).

    Can you see how large this number is configured to be in your setup and change it if its too high?

    Kind regards,
    Andreas

  • Hi,

    as mentioned at the end of my last report: we typically use 1 retransmission, sometimes 2, but not more. From a theoretical point of view 1  retransmission (40ms delay) or 2 retransmissions (20ms delay) should be fine and not cause any buffer filling up.

  • Just to be clear, Publish Retransmit Count (4.2.2.6 in mesh profile spec)" is not the same as Network Retransmit Count (4.2.19.1 in Mesh profile spec) and Relay Retransmit Count (4.2.20.1 in Mesh profile spec). Publish Retransmit Count is on the access layer and the other two retransmit counts are on the network/transport layer 

    Network retransmit is used to decide the number if times you should send a message. If the count is 1, then you will send 2 messages from the sender. In the case where you have relay retransmit count = 1, you will relay the message two times sending 2 messages from the relay node to the receiver. For publish retransmission count = 1 you will repeat the entire procedure one more time, effectively doubling the number of messages sent.

    If you have a higher Publish Retransmit Count than you expect, you will be pushing the limit, if not exceeding the limit of what the throughput can handle and you will most likely flood the buffer. There are also a throughput limit on how many messages you can send per 10s sliding window which is 100 network PDUs in a window (Mesh profile spec v1.0.1 sec 2.3.9.4)

    Edit: In addition, per 3.7.4.1: "Due to limited bandwidth available that is shared among all nodes and other Bluetooth devices, it is important to observe the volume of traffic a node is originating. A node should originate less than 100 Lower Transport PDUs in a moving 10-second window." which means that if you're having a 100ms publish interval and that the publish retransmit count is larger than 0, the node is capped at sending one message every 200ms (since every message leads to two messages on 2 lower transport PDUs), or to not send a publish retransmit at all.
    Edit end.

    There are options to increase the throughput by using advertisement extension features such as extended advertising and multiple advertisement sets, but we will have to look into how to do that for a setup that does not use relaying if that ever becomes a requirement. For a solution using relaying, the following should allow you to increase the throughput with the two mentioned extension features. 

    Enable CONFIG_BT_EXT_ADV

    Enable CONFIG_BT_MESH_ADV_EXT

    Increase CONFIG_BT_EXT_ADV_MAX_ADV_SETS

    Increase CONFIG_BT_MESH_RELAY_ADV_SETS

    Kind regards,
    Andreas

  • Tnx Andreas, we will try the mentioned features.

    Nevertheless it seems that the behavior of NCS 2.2.0 is a lot better.

    Unfortunately we have already qualified the stack based on NCS 1.8.0.

    Will Nordic provide a fix for NCS 1.8.0 resulting in a behavior similar to NCS 2.2.0?

    How can such problems and fixes be handled, since customers which already qualified older NCS version are not allowed to use newer NCS without new qualification. Is there a solution or way of work how to handle such issues without needing re-qualification for new NCS? Or do i have an incorrect understanding of NCS usage ?

    Kind regards, Ulf

Related