I am using the latest Mesh SDK (3.2.0) and SDK (15.3.0) in an nRF52840 SoC.
I developed a message model to send 32 bytes payloads (which I assume would require 3 PDUs with a 32 bit TransMIC, according to section 3.7.3 of Mesh Profile Specification by Bluetooth SIG).
I am getting a weird behavior with my vendor models, which is:
Why is that?! I need to solve this fast, so any assistance would be very appreciated.
This message service is meant to communicate with the highest throughput possible in multicast in order to be scalable to larger networks.
So my real question is: How can I have the highest possible throughput in a multicast transmission (discarding the acknowledged messages, I will have a routine to retrieve missing packets in a larger transaction)?
Thank you so much for your prompt assistance and pointers!
What do you mean by multicast transmission? Do you mean publishing to a group address?
A multicast transmission is a one-to-many transmission. In the case of BT mesh that is accomplished when one node publishes to a group address or virtual address. In my case the lag and packet loss happens when I publish to a group address (0xFFFF - all nodes, for instance, or 0xC000 - a group of nodes).
When you send packets that are longer than 11 bytes, they will be sent as segmented packets.
If you send the packets to a unicast address, the receiver have to ACK all the packets and if they aren't received they have to be sent again. If you send the packets to a group address, there will be no ACK so the packets have to be sent multiple times to compensate.
Number of retries are set in TRANSPORT_SAR_TX_RETRIES_DEFAULT (4) and can be adjusted with NRF_MESH_OPT_TRS_SAR_TX_RETRIES in nrf_mesh_opt.h You can also adjust the timing in nrf_mesh_opt.h.
To get highest possible throughput you should send each message after receiving NRF_MESH_EVT_TX_COMPLETE.
Yes, it's precisely because it is a segmented message that I figured that would be one of the reasons for the difference I mentioned, and that's why I pointed that out.
Thank you for the very informative reply.
To try a quick fix I changed the TRANSPORT_SAR_TX_RETRIES_DEFAULT form 4 to 3 and TRANSPORT_SAR_TX_RETRY_BASE_TIMEOUT_DEFAULT_US from 500 to 150 ms . Both are defined in transport.h
I managed to only loose 1 packet in 108 this way, which is much better, considering that my time-interval between packet transmissions was 700ms and not the 3000ms I had earlier...
Now, as I said, it was a quick fix to change the #defines in the transport.h header. The ideal solution would be to use the function: uint32_t transport_opt_set(nrf_mesh_opt_id_t id, const nrf_mesh_opt_t * const p_opt) , with the NRF_MESH_OPT_TRS_SAR_TX_RETRIES option ID and NRF_MESH_OPT_TRS_SAR_TX_RETRY_TIMEOUT_BASE option ID respectively. My only problem with this is that I don't know how to use the p_opt. Reading the nrf_mesh_opt_t type definition:
/** Length of opt field (for future compatibility). */
/** Option to set/get. */
/** Unsigned 32-bit value. */
/** Byte array. */
uint8_t * p_array;
I don't know how to fill the structure. The only arguument that apparently is used in the transport_opt_set function is p_opt->opt.val, I'm guessing (I haven't yet tried) I could get away with creating a nrf_mesh_opt_t options_var; and then just attributing the value of options_var.opt.var = (uint32_t) value; without messing with the rest of it. Is this correct or is there a better way?
Concerning the max throughput: I created a semaphore flag to signal when the next TX is good to go. This flag is turned TRUE (good to go) when NRF_MESH_EVT_TX_COMPLETE is called, and right before the next packet is published it is turned to FALSE (don't TX) but in such a way that the publication is called.
Now, everything theoretically should be fine, but it seems that NRF_MESH_EVT_TX_COMPLETE isn't always called because I get the first two TX packets and then ne or three after some 50 packets, and then the last packets.... This to me seems to indicate that I need to give some time to breathe between transmissions. What could be causing this and what would be the right way to guarantee max throughput in multicast?
Thank you very much,
Sorry for the delayd the response.
I talked to one of our mesh developers about this and here is his response:
There are two things here:1. Sending a segmented message triggers the SAR re-transmissions until originator's transport layer receives the full BlockAck.2. Sending a segmented message to the group address triggers compulsory retransmissions that do not depend on the BlockAcks (since sender does not know if all the nodes belonging to the group could have received all segments of the SAR in first set of transmissions).
Therefore, when customer uses group address as a DST address, he sees a deterministic packet loss (packets are not actually lost if publishing is done at an interval shorter than the time it takes to finish all retransmissions, the originator's transport layer will simply reject the message). The higher layer API cannot trigger a new SAR to the same destination (unicast or multicast) unless the previous one is either finished completely or cancelled.
If customer needs a higher throughput for multicast messages, the best strategy would be to not use SAR at all. That means each multicast message must fit in a single segment. To increase reliability, a message can be sent multiple times by configuring a higher number of Network Transmit Count (say, 2 or 3). One needs to be careful with multicast retransmissions as doing them too much will cause network interference for other mesh traffic.
Due to the flooding nature of Bluetooth Mesh protocol, some packets may get lost if there is high interference and higher-level application protocol should be designed to deal with this situation.