This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Need help to improve Bluetooth Mesh throughput

I am currently using two nRF52832 SDK boards to implement text message exchange with another nRF52832 SDK board via BLE Mesh.

I based my development on the Light Switch example that was included with the Mesh SDK 1.0.0 

One of the board was based on the Lightswitch Client and the other board is based on Lightswitch Server.

I added two opcodes to the original _SET, _GET, _SET_UNRELIABLE and _STATUS. One opcode is called "SEND_COMMAND "and the second opcode is "SEND_RESPONSE".  I avoided building a custom model at this time so that I can avoid any issue with the provisioning. The main purpose is to test the throughput and latency to see if it is useable for a new product that we are developing.

The client board is programmed to send a text string about 15 characters. The opcode "SEND_COMMAND" is appended to the message.opcode together with a data structure that contains the text string and the length. The "access_model_publish() function is called to send out the packet. The command is published to a group address so that all the servers board will receive the command (currently I only have 1 server connected but the plan is to have all servers that subscribed to the group to receive the command in future)

The server board upon receiving the packet will decode the packet and then send a response. The response is sent with the opcode "SEND_RESPONSE" together with a response string of about 20 characters back to the client. The server too send the response using "access_model_publish()" function.

So far the application codes on both the client and server works well when triggering the client once a second to send the command text string and it does receive the response text string from the server most of the time. 

However, when I try to increase the frequency of sending the command the communication seem to choke. The client could barely send and receive more than 2 command/response packets per second before it starts to choke and lose communication. Once the client stop sending the command you can see some of  the previous packets being sent through but could take a few seconds to complete.

Has anyone experienced the same kind of throughput issue as I did? Is there any parameters that I can adjust to increase the throughput? I read somewhere on this forum that it may be possible for a node to send message of up to 24Hz (I assume it means the client could send up to 24 packets a second) but currently I could not even send and receive faster than 2 packets a second.

Hopefully anyone who has gone through the same experience could point me to possible solutions.

Thank you!

Parents
  • Hi Alecontrol,

    Sorry for the delayed answer. Can you please upload the source files for us to be able to reproduce this issue.

  • simple_on_off.zip

    Thanks for your reply.

    I have attached the modified files for the "simple_on_off_client", "simple_on_off_server" used for testing.

    Client


    To test the command/response, add the following code fragment to the main.c in the simple_on_off_client project:

        extern uint32_t send_command_client(simple_on_off_client_t * p_client, uint8_t *command, uint16_t length);

        uint8_t command[] = "Test";
            status = send_command_client(&m_clients[GROUP_CLIENT_INDEX], command, 4);

    Note: I have edited the length to 4 (was 9 when first posted) since the command string now only contains 4 bytes)

    Server

    The simple_on_off_server has been modified so that it will simply copy the message it received and echo back to the client as a response package.

    The client will run the function "handle_response_cb() when it receives the response from the server. You can log the response it received from the server. In our program we actually send the message out of the UART that requires to link with the app_uart files so you can simply comment out the call to "com1_strout() in our program".

    Thank you if you can share with us your finding and suggested solutions.

  • Hi,

    I am also trying to send text message over ble mesh but until not find solution.

    Can you provide me your light switch main.c file. I have download the above .zip file but try to send simple test message but this also not send. Please help me for send text message client to server.

    Thanks..

  • hello aleccontrol,

    We have also setup a command/control model to test Bluetooth mesh throughput/reliability, however we have not experienced the issue(s) that you mentioned. I'm not clear on the use case that you are trying to implement which requires the mesh client model to send messages with that level of frequency but would suggest that in order to test throughput across your mesh environment, do the following...

    Bluetooth Mesh implements a Health server as a root model so by utilizing that you can quickly monitor message delivery across the mesh network. Using the default light-switch example (with 1 x client and 3 x servers) amend  provisioner.c in the client to request heartbeat messages every 100 milliseconds as follows (note: the default 10s heartbeat has been commented out):

            /* Configure the publication parameters for the On/Off server: */
            case PROV_STATE_CONFIG_PUBLICATION_HEALTH:
            {
                config_publication_state_t pubstate = {0};
                pubstate.element_address = m_target_address;
                pubstate.publish_address.type = NRF_MESH_ADDRESS_TYPE_UNICAST;
                pubstate.publish_address.value = PROVISIONER_ADDRESS;
                pubstate.appkey_index = 0;
                pubstate.frendship_credential_flag = false;
                pubstate.publish_ttl = SERVER_COUNT;
                pubstate.publish_period.step_num = 1;
                //pubstate.publish_period.step_res = ACCESS_PUBLISH_RESOLUTION_10S;       // WJL
                pubstate.publish_period.step_res = ACCESS_PUBLISH_RESOLUTION_100MS;
                pubstate.retransmit_count = 1;
                pubstate.retransmit_interval = 0;
                pubstate.model_id.company_id = ACCESS_COMPANY_ID_NONE;
                pubstate.model_id.model_id = HEALTH_SERVER_MODEL_ID;
                __LOG(LOG_SRC_APP, LOG_LEVEL_INFO, "Setting publication address for the health server to 0x%04x\n", pubstate.publish_address.value);
    
                ERROR_CHECK(config_client_model_publication_set(&pubstate));
                break;
            }

    This will obviously request each provisioned server to send 10 messages a second across the mesh, with the expectation that the 3 servers would push 30 such heartbeats each second. Based upon the results, both reliability and throughput can be measured. The results we see in our environment are as follows:

    So across a 4 minute and 2 second time-span (242 seconds), we're seeing 7,200 messages across the mesh which equates to approx. 30 per sec. which in turn is reflective of the 30 heartbeats that we expected (The mesh spec limits each node to 24 Hz so this is within spec)

    You could additionally amend your environment by increasing the distance between the server nodes and the base client (and adding simple relay nodes where required). Again, the above doesn't mimic your command/response model but it does allow you to at least verify mesh throughput with a single line change to the basic example code.

    I'll try to implement your model in our environment and post back if I see any extraneous results.

    Regards,

Reply
  • hello aleccontrol,

    We have also setup a command/control model to test Bluetooth mesh throughput/reliability, however we have not experienced the issue(s) that you mentioned. I'm not clear on the use case that you are trying to implement which requires the mesh client model to send messages with that level of frequency but would suggest that in order to test throughput across your mesh environment, do the following...

    Bluetooth Mesh implements a Health server as a root model so by utilizing that you can quickly monitor message delivery across the mesh network. Using the default light-switch example (with 1 x client and 3 x servers) amend  provisioner.c in the client to request heartbeat messages every 100 milliseconds as follows (note: the default 10s heartbeat has been commented out):

            /* Configure the publication parameters for the On/Off server: */
            case PROV_STATE_CONFIG_PUBLICATION_HEALTH:
            {
                config_publication_state_t pubstate = {0};
                pubstate.element_address = m_target_address;
                pubstate.publish_address.type = NRF_MESH_ADDRESS_TYPE_UNICAST;
                pubstate.publish_address.value = PROVISIONER_ADDRESS;
                pubstate.appkey_index = 0;
                pubstate.frendship_credential_flag = false;
                pubstate.publish_ttl = SERVER_COUNT;
                pubstate.publish_period.step_num = 1;
                //pubstate.publish_period.step_res = ACCESS_PUBLISH_RESOLUTION_10S;       // WJL
                pubstate.publish_period.step_res = ACCESS_PUBLISH_RESOLUTION_100MS;
                pubstate.retransmit_count = 1;
                pubstate.retransmit_interval = 0;
                pubstate.model_id.company_id = ACCESS_COMPANY_ID_NONE;
                pubstate.model_id.model_id = HEALTH_SERVER_MODEL_ID;
                __LOG(LOG_SRC_APP, LOG_LEVEL_INFO, "Setting publication address for the health server to 0x%04x\n", pubstate.publish_address.value);
    
                ERROR_CHECK(config_client_model_publication_set(&pubstate));
                break;
            }

    This will obviously request each provisioned server to send 10 messages a second across the mesh, with the expectation that the 3 servers would push 30 such heartbeats each second. Based upon the results, both reliability and throughput can be measured. The results we see in our environment are as follows:

    So across a 4 minute and 2 second time-span (242 seconds), we're seeing 7,200 messages across the mesh which equates to approx. 30 per sec. which in turn is reflective of the 30 heartbeats that we expected (The mesh spec limits each node to 24 Hz so this is within spec)

    You could additionally amend your environment by increasing the distance between the server nodes and the base client (and adding simple relay nodes where required). Again, the above doesn't mimic your command/response model but it does allow you to at least verify mesh throughput with a single line change to the basic example code.

    I'll try to implement your model in our environment and post back if I see any extraneous results.

    Regards,

Children
  • Hello Leonwj,

    Thank you very much for the suggestion.

    The BLE Mesh version 1.0.0 that I have downloaded already contain this line in the  

    case PROV_STATE_CONFIG_PUBLICATION_HEALTH:

        pubstate.publish_period.step_res = ACCESS_PUBLISH_RESOLUTION_100MS;

    Does that mean the stack I have been working on is already flooding the comm channel with heartbeat check every 0.1s?  Could that be the reason why the intended command/response packet is being affected by the rapid heart beat check?

    I tried changing it back to 10s (I assume that is what it should have been) but I am still experiencing the same latency issue. Do I need to re-provision the client and the server for the heart beat period to change?

    By the way is there a quick way force the client to re-provision the server? So far I have used "Erase All Memory" option to clear the provisioning data on both clients and servers so that a new provisioning can take place.

  • hello aleccontrol,

    I just re-downloaded both the v1.0 and v1.01 Mesh SDKs from the Nordic website and they both have the  case PROV_STATE_CONFIG_PUBLICATION_HEALTH: set to ACCESS_PUBLISH_RESOLUTION_10S. (Possibly you are confusing this with the case PROV_STATE_CONFIG_PUBLICATION_ONOFF: case which is directly below in the code). So i'm not sure if that setting pertains to your issue.

    In any case, I went ahead and implemented the simple_on_off model that you attached within our environment. As you suggested, we commented out the UART comm1_strout functionality and simply echoed back out the received command string along with a counter value to iterate through the responses from each server node. We set the send_command_client(&m_clients[GROUP_CLIENT_INDEX], command, 4); function to trigger every time we received a heartbeat message from any of the 3 provisioned nodes.

    So in the above scenario, the 3 server nodes send out a heartbeat message (at the specified interval, we tested 1 sec and 0.1 sec) which then triggers the client node to send the "Test" command to the 3 servers which each then send back an echoed response.

    Needless to say, we are not seeing the same throughput issue that you have reported. I have attached the RTT log file from the 100MS (0.1 sec) test run that we analyzed over a 5 minute (300 sec) period. Within that log we see significant mesh network throughput (albeit a simple echo of the "Test" string with a counter value returned). We do however see approximately 25% packet loss at that message frequency.

    Regards,


    rtt-log-100ms-cmd-resp[3-svrs].zip

  • Hi Leonwj,

    You are right that I was looking at the wrong "case" statement so yes, the following statement was indeed the default for "case :PROV_STATE_CONFIG_PUBLICATION_HEALTH"

       pubstate.publish_period.step_res = ACCESS_PUBLISH_RESOLUTION_10s

    Thank you very much for testing out the command/response program. I think the large percentage of packet losses at the server is probably the reason why we feel that the communication appears to choke when attempt to send out faster than 2 packets a second.

    In an M2M communication typically when the client sends out a command packet, it will wait for a response packet to be received from the server before it proceed to send the next command. If the server fails send a response, then the client will time out after some time (in our case we set the time out to 1 second). The client will only send the next command packet after the time out.

    If 25% of the consecutive commands is lost as per your test, that means that the client will fail to receive 1 in 4 responses. The client is thus unable to sustain a smooth command/response communication with the server every 100ms as it constantly have to wait for the time out before it can send the next packet. So the communication appears to stop frequently due to the time out.

    What is the reason for such high packet loss and is there any way to minimize the packet loss?  The client and the server are actually sitting at less than 1 meter apart and I only have 1 client and 1 server communicating. There wasn’t any other Bluetooth devices communicating nearby.  What would be the maximum command/response exchange rate in order to maintain <= 1% packet loss?

    Thanks again for your expert advise!

  • hello aleccontrol,

    Apologies for the belated response to your latest comment...

    Certain things that I wanted to point out regarding the packet loss statistics that I presented, are as follows:

    1. the stats were from using the model that you uploaded without any changes, so we would possibly need to review the model before attributing any loss to the underlying mesh

    2. We implemented the cmd/resp mechanism under the heartbeat callback mechanism, which may not be the most appropriate place to do so. e.g. if we set the heartbeat period to 100 msecs for each server, that means that when each server sends a message, the client immediately publishes a "Test" command to the group (i.e. 3 servers) which then each send back a response. So every 1/10th sec we have:

    • a. 3 heartbeat messages (server(s) -> client)
    • b. 1 command message published to the group (client -> server(s))
    • c. 3 response messages sent to the client (server(s) -> client)

    Given that there shouldn't be any relaying taking place (the nodes are within close proximity and message caching would ensure that repeats are avoided), that means 70 messages every second across the mesh. Also, we haven't done any timeline analysis to determine if there's a specific point(s) within our basic test where we start to see packet loss or whether the loss was uniform.


    I do recall reading a vendor produced whitepaper, where circa 900 mesh devices were deployed over a 2,000 sq m. area with a target response time/latency of 300 msec being set for message delivery in message traffic configs of (low - ~150bps) - (medium - ~1kbps) - (high - ~3kbps). I grabbed a screenshot of the results at the time (see below), which highlighted that at least 99.1% of messages were successfully delivered with 300 msec latency across a high traffic (enhanced configuration) in both sparse and dense deployments. I recall that enhanced meant that source message re-transmissions were performed as well as advertising randomization whereas Baseline was a standard mesh implementation. Also note that this wasn't Nordics's mesh stack!! (If you are interested, let me know and I'll see if I can dig out the link to the whitepaper.)

    I guess the general takeaway(s) from the above would be:

    1. that we shouldn't base any firm design decision(s) on the packet loss figures that I provided (since the config is/was not optimized or validated)
    2. the High traffic w/Dense deployment Baseline stat (69.2% - from the table), does reflect a similar level of degradation to what we saw in our basic test
    3. there are Enhanced optimizations that can be implemented to increase mesh reliability and throughput
    4. the results reflect fairly well on Bluetooth mesh when compared to results i've seen across Zigbee, Z-wave etc. based on lower device numbers (the caveat being that this is only one specific configuration under test)

    I hope the above helps.

    Regards,

  • Additionally, (for both yourself and anyone else who's interested in the throughput level expectations of Bluetooth Mesh), you might want to listen to the following podcast/webinar in which Szymon Slupik, Chair of the Bluetooth SIG Mesh Working Group highlights why he believes that Bluetooth Mesh meets the required performance levels (among other benefits) for both industrial and home.


    (please also note that I'm not affiliated with Szymon and/or Mr. Beacon in any way but have simply followed the mesh specification process since FRD 0.7)

    Regards,

Related