This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Need help to improve Bluetooth Mesh throughput

I am currently using two nRF52832 SDK boards to implement text message exchange with another nRF52832 SDK board via BLE Mesh.

I based my development on the Light Switch example that was included with the Mesh SDK 1.0.0

One of the board was based on the Lightswitch Client and the other board is based on Lightswitch Server.

I added two opcodes to the original _SET, _GET, _SET_UNRELIABLE and _STATUS. One opcode is called "SEND_COMMAND "and the second opcode is "SEND_RESPONSE". I avoided building a custom model at this time so that I can avoid any issue with the provisioning. The main purpose is to test the throughput and latency to see if it is useable for a new product that we are developing.

The client board is programmed to send a text string about 15 characters. The opcode "SEND_COMMAND" is appended to the message.opcode together with a data structure that contains the text string and the length. The "access_model_publish() function is called to send out the packet. The command is published to a group address so that all the servers board will receive the command (currently I only have 1 server connected but the plan is to have all servers that subscribed to the group to receive the command in future)

The server board upon receiving the packet will decode the packet and then send a response. The response is sent with the opcode "SEND_RESPONSE" together with a response string of about 20 characters back to the client. The server too send the response using "access_model_publish()" function.

So far the application codes on both the client and server works well when triggering the client once a second to send the command text string and it does receive the response text string from the server most of the time.

However, when I try to increase the frequency of sending the command the communication seem to choke. The client could barely send and receive more than 2 command/response packets per second before it starts to choke and lose communication. Once the client stop sending the command you can see some of the previous packets being sent through but could take a few seconds to complete.

Has anyone experienced the same kind of throughput issue as I did? Is there any parameters that I can adjust to increase the throughput? I read somewhere on this forum that it may be possible for a node to send message of up to 24Hz (I assume it means the client could send up to 24 packets a second) but currently I could not even send and receive faster than 2 packets a second.

Hopefully anyone who has gone through the same experience could point me to possible solutions.

Thank you!

Top Replies

Parents

0 Susheel Nuguru over 7 years ago

Hi Alecontrol,

Sorry for the delayed answer. Can you please upload the source files for us to be able to reproduce this issue.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 aleccontrol over 7 years ago in reply to Susheel Nuguru

simple_on_off.zip

Thanks for your reply.

I have attached the modified files for the "simple_on_off_client", "simple_on_off_server" used for testing.

Client

To test the command/response, add the following code fragment to the main.c in the simple_on_off_client project:

   extern uint32_t send_command_client(simple_on_off_client_t * p_client, uint8_t *command, uint16_t length);

   uint8_t command[] = "Test";
        status = send_command_client(&m_clients[GROUP_CLIENT_INDEX], command, 4);

Note: I have edited the length to 4 (was 9 when first posted) since the command string now only contains 4 bytes)

Server

The simple_on_off_server has been modified so that it will simply copy the message it received and echo back to the client as a response package.

The client will run the function "handle_response_cb() when it receives the response from the server. You can log the response it received from the server. In our program we actually send the message out of the UART that requires to link with the app_uart files so you can simply comment out the call to "com1_strout() in our program".

Thank you if you can share with us your finding and suggested solutions.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 leonwj over 7 years ago in reply to aleccontrol
hello aleccontrol,

Apologies for the belated response to your latest comment...

Certain things that I wanted to point out regarding the packet loss statistics that I presented, are as follows:

1. the stats were from using the model that you uploaded without any changes, so we would possibly need to review the model before attributing any loss to the underlying mesh

2. We implemented the cmd/resp mechanism under the heartbeat callback mechanism, which may not be the most appropriate place to do so. e.g. if we set the heartbeat period to 100 msecs for each server, that means that when each server sends a message, the client immediately publishes a "Test" command to the group (i.e. 3 servers) which then each send back a response. So every 1/10th sec we have:

a. 3 heartbeat messages (server(s) -> client)

b. 1 command message published to the group (client -> server(s))

c. 3 response messages sent to the client (server(s) -> client)

Given that there shouldn't be any relaying taking place (the nodes are within close proximity and message caching would ensure that repeats are avoided), that means 70 messages every second across the mesh. Also, we haven't done any timeline analysis to determine if there's a specific point(s) within our basic test where we start to see packet loss or whether the loss was uniform.

I do recall reading a vendor produced whitepaper, where circa 900 mesh devices were deployed over a 2,000 sq m. area with a target response time/latency of 300 msec being set for message delivery in message traffic configs of (low - ~150bps) - (medium - ~1kbps) - (high - ~3kbps). I grabbed a screenshot of the results at the time (see below), which highlighted that at least 99.1% of messages were successfully delivered with 300 msec latency across a high traffic (enhanced configuration) in both sparse and dense deployments. I recall that enhanced meant that source message re-transmissions were performed as well as advertising randomization whereas Baseline was a standard mesh implementation. Also note that this wasn't Nordics's mesh stack!! (If you are interested, let me know and I'll see if I can dig out the link to the whitepaper.)

I guess the general takeaway(s) from the above would be:

that we shouldn't base any firm design decision(s) on the packet loss figures that I provided (since the config is/was not optimized or validated)

the High traffic w/Dense deployment Baseline stat (69.2% - from the table), does reflect a similar level of degradation to what we saw in our basic test

there are Enhanced optimizations that can be implemented to increase mesh reliability and throughput

the results reflect fairly well on Bluetooth mesh when compared to results i've seen across Zigbee, Z-wave etc. based on lower device numbers (the caveat being that this is only one specific configuration under test)

I hope the above helps.

Regards,
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 leonwj over 7 years ago in reply to leonwj

Additionally, (for both yourself and anyone else who's interested in the throughput level expectations of Bluetooth Mesh), you might want to listen to the following podcast/webinar in which Szymon Slupik, Chair of the Bluetooth SIG Mesh Working Group highlights why he believes that Bluetooth Mesh meets the required performance levels (among other benefits) for both industrial and home.

(please also note that I'm not affiliated with Szymon and/or Mr. Beacon in any way but have simply followed the mesh specification process since FRD 0.7)

Regards,
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 aleccontrol over 7 years ago in reply to leonwj

Hi Leonwj,

I appreciate that you took time to reply and provided some useful statistics. I agree that your test case was pretty intense since you are using 3 servers to trigger the client to send command every 100ms so the air wave is probably pretty congested.

In our case we are just sending the command out of the client (the frequency of sending the command is entirely controlled by the client) and then the client will wait to receive the response back from one single server. Only when the client has received a response from the server will it continue to send another command. This is pretty common method in industrial control where the master (client) send a command to a slave (server) and then wait for the slave to return the response before continuing further.

Would you be able to test such a setup and determine the latency that is expected for a complete command/response exchange? We are interested in knowing how many command and response exchanges that can realistically be accomplished via the BLE mesh to determine if we could use BLE Mesh as a communication media.

What we experienced is that it seems impossible for the client to sustain a continuous command/response exchanges with the server without frequent errors. One in every few commands that were sent out by the client did not get a response from the server and so the client side blocks until time-out (set at 2 seconds). In our test case we are sending just a few bytes (below 11 bytes threshold before SAR kicks in) and in real life the client and server will be sending and receiving many more bytes per exchange so it is important that latency and packet loss are kept to minimum.

In our application the client may occasionally need to retrieve say 2000 bytes of data from a server. What would be best the way to achieve it in the shortest possible amount of time? In traditional communication via UART the client will send a command packet that tell the server to return say up to 100 bytes per response packet and so it will take about 20 successful exchanges to retrieve 2000 bytes. How long would you expect it will take to retrieve this amount of data?

Thank you very much!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Hadi Deknache over 7 years ago in reply to leonwj

Hi, this might be an off topic question, but we were testing this and wanted to log something similar to what is done above with timestamps when starting and stopping. How did you achieve this when running Segger J-Link RTT Viewer?

Thanks in advance :)
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 mike over 7 years ago in reply to leonwj

Hello, your'e saying you had no issues with the attached code, so naturally I'm wondering where the difference lies=) do you call send_command_client() from main.c or how have you set it up?(newbie here). I suppose you built it on top of the light switch example?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 mike over 7 years ago in reply to leonwj

Hello, your'e saying you had no issues with the attached code, so naturally I'm wondering where the difference lies=) do you call send_command_client() from main.c or how have you set it up?(newbie here). I suppose you built it on top of the light switch example?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 leonwj over 7 years ago in reply to mike

Hello,

Yes, from recollection the code was built on top of the light switch example in v1.0.1 of the mesh SDK. I recall that a few changes were made to get it up and running but the custom model was left intact.

Regards,
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 mike over 7 years ago in reply to leonwj

Is there any need to change something in the node configuration or node setup on the provisioner side?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel