This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Very high response times of reliable unicast messages.

I adapted the light switch demo of the nRF5 SDK for Mesh for sending a series of reliable on/off messages sequentially from an nRF52840 Preview DK to 20 Thingy:52 servers via send_reliable_message(). In this scenario I measure in most cases response times between 20 to 50 milliseconds, which is quite acceptable. However in about one of 20 cases there is a significant higher response time of many seconds. It seems that the delay in this cases is always a multiply of 5 seconds, so I get 5, 10, 15 and so on seconds response time in these cases. Is this a normal behavior of the stack, and do I therefore have to accept these sporadic long response times? Or is there maybe a parameter which can be tweaked in order to get a better performance with reliable messages?

Parents
  • Hi Armin, 

    No I don't think it's a normal behavior. Do you see the same problem when you test with unmodified version of the light switch examples ? 

    Do you see the same problem if you have less number of nodes ? 

     

    Did you run anything else beside mesh on the node ? Running BLE or proxy on the node would affect the mesh performance. 

     

    The mechanism to resend notification is similar to trickle algorithm the interval is multiple by ACCESS_RELIABLE_BACK_OFF_FACTOR by each retry. But your 5-10-15 seconds observation is quite strange, wouldn't match with what we have. 

    Which TTL value did you set ? The timeout is calculated by the TTL value. 

    Did you send the reliable message to a unicast address or to a group address  ?

    Also do you see the packet arrive on the peer device before that 5 -10 .. seconds ? We need to find if it was the original packet missing or the response packet missing. 

  • The behavior is independent from the number of nodes. I tested with 3, 5, 20 and 30 nodes.

    The messages are sent from the light switch client node, which runs on the nRF52840 Preview DK with project light_switch_proxy_client_nrf5284_xxAA_s140_6.0.0. By that it is also the proxy for commissioning via Smartphone. All server nodes running on Thingy:52 with project light_switch_proxy_server_nrf52832_xxAA_s132_6.0.0, so all servers are also proxy.

    The TTL value is the default set in nrf_mesh_config_app.h #define ACCESS_DEFAULT_TTL (SERVER_NODE_COUNT) whereas SERVER_NODE_COUNT is set in light_switch_example_common.h as #define SERVER_NODE_COUNT (30)

    The reliable messages are sent to a unicast address. Group addresses can only be used with unreliable messages!

    Yes, the packet arrives at the servers always very fast. An LED is switched on there when the message arrives and I see almost no visual delay between the time the message is sent which is indicated by a log output at JLink Viewer and the LED lights on. By that the high delay comes actually from the response!

  • Hi Armin,

    Very sorry for the delayed response. Hung has been on vacation for the past two weeks & it seems we forgot to follow up on this case. I am very sorry about that. Hung will be back on Monday, but I will try my best to help you out & hopefully we can figure out the issue before he is back. Are you using mesh sdk v2.0.1?

    Kind Regards,

    Bjørn

  • Yes, according to the RELEASE_NOTES.md we use

    BLE Mesh v2.0.1
  • Hi,

    We are very sorry for the delays over the past days. I have been looking at this issue today.

    I think that your reported 10 ms advertising interval is way too short for letting the Mesh stack run properly. Did you test increasing it?

    It would be great if we could build and reproduce this issue locally, but it seems many of the files that are needed to build the project are still missing... Usually we ask for a minimal example showing the erroneous behavior. It may be an option to strip some of the unneeded functionality away from the project instead. Or, we would need all the included c and h files for building the projects.

    If I have understood things correctly, the projects are:

    For nRF52840 DK: examples\light_switch\proxy_client\light_switch_proxy_client_nrf52840_xxAA_s140_6_0_0.emProject

    For the Thingy:52s: examples\light_switch\thingy_provisioning_demo_generic_OnOff_BLINK\light_switch_proxy_server_nrf52832_xxAA_s132_6_0_0.emProject

    Can you confirm or correct that?

    Regards,
    Terje

  • Hi,

    Regarding the delays that you see, you mentioned they all are a multiple of 5 seconds. Can you confirm that you see all of the multiples of 5, and that it is not a series where each number is the double amount of the previous number, as in 5, 10, 20, etc?

    One hypothesis is that while the message is sent and received, the acknowledgement is lost, which means the message is resent until an acknowledgement is correctly received (or a timeout). But the behavior for resending messages is that the time between retries is doubled every time, and that does not correspond to delays increasing linearly.

    Regards,
    Terje

  • Regarding advertising interval I did not change that parameter against what is preset in your examples. I even don't know where this is defined, so if I shall increase it as you recommended please let me know in which file it is exactly defined so that I can find it.

    Yes, the projects are those which are mentioned by you before. Look:

    For reproduction you should take the original Nordic examples and override the sources in:

    • \nRF5_SDK_for_Mesh\examples\light_switch\proxy_client\include
    • \nRF5_SDK_for_Mesh\examples\light_switch\proxy_client\src
    • \nRF5_SDK_for_Mesh\models\generic_on_off\include
    • \nRF5_SDK_for_Mesh\models\generic_on_off\src
    • \nRF5_SDK_for_Mesh\examples\light_switch\include
    • \nRF5_SDK_for_Mesh\examples\light_switch\thingy_provisioning_demo_generic_OnOff_BLINK\include
    • \nRF5_SDK_for_Mesh\examples\light_switch\thingy_provisioning_demo_generic_OnOff_BLINK\src

    Regarding the delays I made a mistake in my description: it is actually always a double amount of the previous number starting with 5 as you mentioned above, i.e. 5, 10, 20, etc. and not just a multiply of 5. Sorry for that. In fact, if I understood it right, this confirms your hypothesis of lost acknowledges.

    However, the question remains why there are messages lost. Maybe the behavior improves when the advertising interval time is increased. So either you test it on your own or you help me finding the code where the parameter is defined so I can test it. But this may take some days as I'm currently quite busy with other work.

Reply
  • Regarding advertising interval I did not change that parameter against what is preset in your examples. I even don't know where this is defined, so if I shall increase it as you recommended please let me know in which file it is exactly defined so that I can find it.

    Yes, the projects are those which are mentioned by you before. Look:

    For reproduction you should take the original Nordic examples and override the sources in:

    • \nRF5_SDK_for_Mesh\examples\light_switch\proxy_client\include
    • \nRF5_SDK_for_Mesh\examples\light_switch\proxy_client\src
    • \nRF5_SDK_for_Mesh\models\generic_on_off\include
    • \nRF5_SDK_for_Mesh\models\generic_on_off\src
    • \nRF5_SDK_for_Mesh\examples\light_switch\include
    • \nRF5_SDK_for_Mesh\examples\light_switch\thingy_provisioning_demo_generic_OnOff_BLINK\include
    • \nRF5_SDK_for_Mesh\examples\light_switch\thingy_provisioning_demo_generic_OnOff_BLINK\src

    Regarding the delays I made a mistake in my description: it is actually always a double amount of the previous number starting with 5 as you mentioned above, i.e. 5, 10, 20, etc. and not just a multiply of 5. Sorry for that. In fact, if I understood it right, this confirms your hypothesis of lost acknowledges.

    However, the question remains why there are messages lost. Maybe the behavior improves when the advertising interval time is increased. So either you test it on your own or you help me finding the code where the parameter is defined so I can test it. But this may take some days as I'm currently quite busy with other work.

Children
Related