This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE Mesh Unicast vs Multicast Message packet RX Latency and Loss

Hi all,

I am using the latest Mesh SDK (3.2.0) and SDK (15.3.0) in an nRF52840 SoC.

I developed a message model to send 32 bytes payloads (which I assume would require 3 PDUs with a 32 bit TransMIC, according to section 3.7.3 of Mesh Profile Specification by Bluetooth SIG).

I am getting a weird behavior with my vendor models, which is:

  • If I publish messages from the server using unicast messages the client can receive all messages sent even if I send messages within an a 200ms  interval of each other.
  • However if I send messages using multicast, the client isn't receiving all messages. In particular, the number of packets loss is deterministic and a function of the time interval between sent messages. That is, if the server sends messages every 500ms the client receives every sixth (+/-2) message. If I increase the interval to 1000ms, I receive every 3rd message (+/-1), if I send messages every 3000ms I can receive all messages....

Why is that?! I need to solve this fast, so any assistance would be very appreciated.

This message service is meant to communicate with the highest throughput possible in multicast in order to be scalable to larger networks.

So my real question is: How can I have the highest possible throughput in a multicast transmission (discarding the acknowledged  messages, I will have a routine to retrieve missing packets in a larger transaction)?

Thank you so much for your prompt assistance and pointers!

Regards

//E

Parents Reply Children
  • Hello,

    Yes.

    A multicast transmission is a one-to-many transmission. In the case of BT mesh that is accomplished when one node publishes to a group address or virtual address. In my case the lag and packet loss happens when I publish to a group address (0xFFFF - all nodes, for instance, or 0xC000 - a group of nodes).

    Thanks!

    //EAn

  • When you send packets that are longer than 11 bytes, they will be sent as segmented packets. 

    If you send the packets to a unicast address, the receiver have to ACK all the packets and if they aren't received they have to be sent again. If you send the packets to a group address, there will be no ACK so the packets have to be sent multiple times to compensate.

    Number of retries are set in  TRANSPORT_SAR_TX_RETRIES_DEFAULT (4) and can be adjusted with NRF_MESH_OPT_TRS_SAR_TX_RETRIES in nrf_mesh_opt.h You can also adjust the timing in nrf_mesh_opt.h.

    To get highest possible throughput you should send each message after receiving NRF_MESH_EVT_TX_COMPLETE. 

  • Hello Mttrinh,

    Yes, it's precisely because it is a segmented message that I figured that would be one of the reasons for the difference I mentioned, and that's why I pointed that out.

    Thank you for the very informative reply.

    To try a quick fix I changed the TRANSPORT_SAR_TX_RETRIES_DEFAULT form 4 to 3 and TRANSPORT_SAR_TX_RETRY_BASE_TIMEOUT_DEFAULT_US from 500 to 150 ms . Both are defined in transport.h

    I managed to only loose 1 packet in 108 this way, which is much better, considering that my time-interval between packet transmissions was 700ms and not the 3000ms I had earlier...

    Now, as I said, it was a quick fix to change the #defines in the transport.h header. The ideal solution would be to use the function:   uint32_t transport_opt_set(nrf_mesh_opt_id_t id, const nrf_mesh_opt_t * const p_opt) , with the NRF_MESH_OPT_TRS_SAR_TX_RETRIES option ID and NRF_MESH_OPT_TRS_SAR_TX_RETRY_TIMEOUT_BASE option ID respectively. My only problem with this  is that I don't know how to use the p_opt. Reading the nrf_mesh_opt_t type definition:

    typedef struct
    {
        /** Length of opt field (for future compatibility). */
        uint32_t len;
        /** Option to set/get. */
        union
        {
            /** Unsigned 32-bit value. */
            uint32_t val;
            /** Byte array. */
            uint8_t * p_array;
        } opt;
    } nrf_mesh_opt_t;

    I don't know how to fill the structure. The only arguument that apparently is used in the transport_opt_set function is p_opt->opt.val, I'm guessing (I haven't yet tried) I could get away with creating a nrf_mesh_opt_t options_var; and then just attributing the value of options_var.opt.var = (uint32_t) value; without messing with the rest of it. Is this correct or is there a better way?

    Concerning the max throughput: I created a semaphore flag to signal when the next TX is good to go. This flag is turned TRUE (good to go) when NRF_MESH_EVT_TX_COMPLETE is called, and right before the next packet is published it is turned to FALSE (don't TX) but in such a way that the publication is called.

    Now, everything theoretically should be fine, but it seems that NRF_MESH_EVT_TX_COMPLETE isn't always called because I get the first two TX packets and then ne or three after some 50 packets, and then the last packets.... This to me seems to indicate that I need to give some time to breathe between transmissions. What could be causing this and what would be the right way to guarantee max throughput in multicast?

    Thank you very much,

    //EA

  • Hi,

    Sorry for the delayd the response.

    I talked to one of our mesh developers about this and here is his response:

    There are two things here:
    1. Sending a segmented message triggers the SAR re-transmissions until originator's transport layer receives the full BlockAck.
    2. Sending a segmented message to the group address triggers compulsory retransmissions that do not depend on the BlockAcks (since sender does not know if all the nodes belonging to the group could have received all segments of the SAR in first set of transmissions).

    Therefore, when customer uses group address as a DST address, he sees a deterministic packet loss (packets are not actually lost if publishing is done at an interval shorter than the time it takes to finish all retransmissions, the originator's transport layer will simply reject the message). The higher layer API cannot trigger a new SAR to the same destination (unicast or multicast) unless the previous one is either finished completely or cancelled.

    If customer needs a higher throughput for multicast messages, the best strategy would be to not use SAR at all. That means each multicast message must fit in a single segment. To increase reliability, a message can be sent multiple times by configuring a higher number of Network Transmit Count (say, 2 or 3). One needs to be careful with multicast retransmissions as doing them too much will cause network interference for other mesh traffic.

    Due to the flooding nature of Bluetooth Mesh protocol, some packets may get lost if there is high interference and higher-level application protocol should be designed to deal with this situation.

  • Hi Mttrinh

    I had exactly the same behaviour as EAn initially was describing, working with the current Mesh SDK v4.0.0.

    Section 3.5.3.3 in the Mesh Profile Specification v1.0.1 says the following regarding to sending messages to Non-Unicast addresses:

    • We do not have to expect Segment ACKs when sending messages to non-Unicast addresses (as discussed)
    • It is recommended to send lower transport PDUs multiple times, with small random delays between repetitions.

    I derived from those statements, that it is not correct to wait for segment ACKs after sending to Non-Unicast addresses. The segment transmission retries looked for me like a result of the fact, that the transport layer of the SDK does not differentiate the destination address types and always waits for segment ACKs. Of course, we should add redundancy when sending to Non-Unicast addresses, which is recommended in the standard. But with "small random delays" between the repetitions.

    After reading the discussion of this devzone case, I would like to put the following in question:

    1. Can we be sure that we do not mix-up "Retries" (required when segment ACKs are missing) with "Retransmission" (recommended when sending to Non-Unicast addresses) ?
    2. If we are retransmitting after the constant retry timeout of 500ms (initial setting in the SDK), can we really speak of "small random delays" as stated in section 3.5.3.3 ?

    My workaround looks like this: I differentiate between the message destination addresses in the transport layer. If the destination is non-unicast, I set "artificial" segment ACKs directly after successful passing it to function "sar_segment_send(..)". The function segmented_packet_tx had to be modified too, for finishing the SAR session as soon all segments are out.

    What does the Nordic Team think about this? I would send you this solution more detailled if you say that this is the correct way of solving it.

    Regards

Related