This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Multiple nodes acknowledgement problem

Getting acknowledgement only from single node which is 1st connected

Parents
  • Hi Anaya, 

    Please be more specific when asking question. You will get the support faster. 

    Please provide us information how you setup the nodes, if they subscribe to anything, how you send the message. What kind of message it is. 

  • Good evening Sir,

    Ok Sir. I will explain the case :

    1. I have created a mesh network where  I have configured 2 nodes say A and B.

    2. on A node I have 2 on-off servers and set publication address 0x0001 and 0x0002. The lamp operates ok on hardware.

    3. on B node I have 2 on-off servers and set publication address 0x0003 and 0x0004. The lamp operates ok on hardware.

    4. I disconnect the network and again reconnect. Then Node A occurred 1st. I got connected to network.

    5.  I redirect to a new activity where I have given Lamp1, Lamp2, Lamp3, Lamp4 with on-off buttons.

    6. While click on on/off of Lamp1 and Lamp2 then lamp1 operates at hardware and also get onMeshMessageReceived acknowledgment on which I want to show on/off status.

       But when I operate Lamp3 or Lamp4, I didn't get onMeshMessageReceived acknowledgment.

    7. But if I connected to node B, then it works exact opposite of point 5.

       That is I got acknowledgment of Lamp4 and Lamp5 but not of Lamp1 and Lamp2 which are configured on node A.

    8. My requirement is need to receive every lamp acknowledgment when connected to any node in the network.

    Kindly give the solution.

  • Hi Hung,

    Greetings!

    Thanks for your reply and suggestions.

    As per your suggestion we have taken trials with modified and stock applications (light switch server) and captured sniffer log for device to which android app is connected. (2 trials for each modified and stock application). Hope now we have given you the correct log files.

    Please note that we have experienced timeout log in both the situations.

    Request you to please suggest us further steps to resolve this issue.

    Regards,

    Anaya

    Wireshark_capture_files_20200518_165500.zip

  • Hi Anaya, 

    I just tried here with the stock example and found the same issue. Most likely it's the update on the app caused the issue. I will continue the test and let you know if we find the root cause.

  • Hi Anaya, 

    I have continued the test but the result was not consistent. So at a point I have 60% packet dropped but when I tested again now it worked like before about 5-10% drop. I suspecting the interference from many of our test devices in the building caused the issue. 
    I would suggest you to check that as well. Do you have any high traffic on the 2.4GHz domain ? like wifi and other network ? 

    In your test how many devices do you have ? Have you tried to test using NRF52 DK ? 

    We need to have a better way to debug this issue. I think it's the best to print out logging. 

    Please init logging like this in the light switch application:  

    __LOG_INIT(LOG_SRC_APP | LOG_SRC_FRIEND|LOG_SRC_BEARER|LOG_SRC_NETWORK, LOG_LEVEL_DBG1, LOG_CALLBACK_DEFAULT);
    
    

    We will stick with the light switch application for testing to make it easier to align the test on your side and our side. If you can test using a fresh copy of the SDK it would be nice. I assume you are testing on SDK v4.0 ? Please also consider testing on v4.1 as well. 

    In the RTT log you will be able to find logging from proxy.c and from network.c. 
    The proxy.c handle the connection to the phone. And network.c is the actual mesh ADV packet between nodes. 

    In my testing here, I can see that when there is a command from the phone (that got timeout) the proxy node has managed to forward it to the network layer and the network layer sent the packet. 

    But on the remote node (not the proxy node) I don't see it receives this TX packet, showing that the packet is dropped/corrupted, could be due to interference. 

    Please try to test with debug enabled and let me know what you find. Also please try to test on different environment, and with different phone, etc. 

  • Hi Hung,

    Greetings!

    I am colleague of Mrs. Anaya, working on nRF52 part.

    As suggested by you, we have to debug this issue in more better way so let me introduce myself in this communication.

    Please see my reply as below considering your above queries.

    1. We have Wi-Fi network in our company with 2.4GHz frequency (auto mode channel).

    2. Also please note that yesterday we had trials and have also experienced same thing which you had. At first trial we had good communication response but in next trial it was worst experience.

    Here let me understand that if Wi-Fi network is causing for data loss then what can we do to improve this?

    3. In our test setup we are using 2 devices under test. However in our lab there are more Bluetooth devices are active say about 8-10 devices which are being used by my colleagues (provisioned with different android devices)

    Here let me understand below point as you asked for number of devices.

    Is increase in number of devices lead to decrease communication response/performance?

    4. We have also test it on nRF52 DK and have also experienced this timeout error

    5. We will take trials with logging enabled as you suggested and will let you know the test results.

    6. At present we are using SDK 4.0.0 with modified light switch server application.

    I think it would be for us better to test using this application as we had more trials with this setup till now.

    7. Can you please explain bit more on what do you mean by different environment?

    So that it will help us to take trials precisely.

    8. One more point which I would like to share with you that. Yesterday we have trials with devices which do not go for flash write operations (to save device status).

    Here we have seen improvement in communication but it is not that worth to say that problem is resolved. Here improvement is about 10% only

    Thank you so much.

    Regards,

    Dinesh

  • Hi Hung,

    Greetings!

    Please note that we have taken trials as suggested by you (with SDK4.0.0).

    1. We have taken trials keeping Wi-Fi network On and Off
    2. We have enabled log as you told
    3. We are also experiencing that proxy node forwards command to end node but at end node it looks like packet is not received
    4. Also we have observed that decrypt status for network RX packet is '5' (NRF_ERROR_NOT_FOUND). You can see this log in screenshot as "Net RX decrypt status: 5"

    Please see attached screenshot(s) for your reference.

    Regards,

    Dinesh

Reply
  • Hi Hung,

    Greetings!

    Please note that we have taken trials as suggested by you (with SDK4.0.0).

    1. We have taken trials keeping Wi-Fi network On and Off
    2. We have enabled log as you told
    3. We are also experiencing that proxy node forwards command to end node but at end node it looks like packet is not received
    4. Also we have observed that decrypt status for network RX packet is '5' (NRF_ERROR_NOT_FOUND). You can see this log in screenshot as "Net RX decrypt status: 5"

    Please see attached screenshot(s) for your reference.

    Regards,

    Dinesh

Children
  • Hi Dinesh, 

    It's not possible for us to debug based on the log screenshot. I would suggest you to check and confirm that the proxy node sent the packet but the destination node couldn't receive the packet. 

    Then the next step is to test normal mesh communication (instead of sending command from the phone, you send the mesh command directly from one node to another node, and check the failure rate of this method). 

    My suggestion on testing on different environment is to do the same test at a location with less Wifi (or other RF) traffic. Check for other bluetooth device in the building, if there is a device that keep transmitting all the time at low interval it can cause the problem. So it's important to test on a clean enviroment (try to test outside, in an open area for example) 


    Please try to test using more devices in the network. This will improve the redundancy and the packet drop rate will be lower. 


  • Hi Hung,

    Greetings!

    As you suggested we have taken trials with below setup
    SDK: v4.0.0
    Client: Stock SDK Light switch application on nRF52DK hardware
    Server: Modified SDK Light switch server application on custom hardware/pcb
    Mobile App: nRF Mesh App on Android

    Trials are as below

    1. Indoor trials
    a. From Embedded switch client (Client device)
    i. 1 Server and 1 client in setup (Client Device -------> 1st Server)
    It is observed that there is no data miss with this device.

    ii. 2 Server devices and 1 client device (Client Device -------> 1st Server -------> 2nd Server)
    Here data miss or timeout issue occurred (data missing rate is around 20%)
    We have placed devices in such a way that 2nd server is not operable from client device.
    In this case we have observed data miss issue (operated 2nd server) and as per log it is observed that
    ACK from end device is received at 1st server but it is not forwarded to client device,

    b. From Mobile App
    i. 1 Server and 1 client in setup (Client Device -------> 1st Server)
    No data loss occurrs as client device (mobile app) is directly connected to end node

    ii. 2 Server devices and 1 client device (Client Device -------> 1st Server -------> 2nd Server)
    Data missing issue occurred (data missing rate is above 50%).
    As per log it is observed that at some time proxy node forwarded command but end node did not received it,
    and at some time end node send back ACK command but proxy node did not received it.

    2. Outdoor trials
    a. From Embedded switch client (Client device)
    i. 1 Server and 1 client in setup (Client Device -------> 1st Server)
    ii. 2 Server devices and 1 client device (Client Device -------> 1st Server -------> 2nd Server)
    It is observed that there is no data miss issue if devices are kept at line of sight.
    However if there is change of 2-3 feets change in line-of-sight resulted in 15-20% data missing.

    b. From Mobile App
    i. 1 Server and 1 client in setup (Client Device -------> 1st Server)
    No data loss occurrs as client device (mobile app) is directly connected to end node

    ii. 2 Server devices and 1 client device (Client Device -------> 1st Server -------> 2nd Server)
    Here data miss or timeout issue occurred (data missing rate is above 50%).

    Request you to please provide us possible resolution to this issue.
    We are waiting for your positive reply on the same.
    Thank you once again for your extended help to us.

    Regards,

    Dinesh

  • Hi Dinesh, 

    Thanks for the intensive test information.

    Could you give more information about the range you achieved in the test ? 

    - What was the range achieved in the test outdoor ? Especially the test with 3 devices. 

    - Have you tried to test the same with only the nRF52 DK (on both 2 servers) ? The reason for this is to avoid any RF issue on your hardware board. 

    The success rate you achieve is much lower that what we observed here. 

    Another test you can do is to use the sniffer to track the Mesh packet. If you use this filter: btcommon.eir_ad.entry.type == 0x2a like this: 

    You should be able to filter out only ADV mesh packets. 

    You can see a successful transaction (turn on or off the light) should be something like this: 

    In this device 86:06 is the one who connected to the phone, and the 29:17 was the destination node. 

    If it failed either the initial packet from 86:06 would be missing, or the 86:06 is received by the sniffer but couldn't receive by the 29:17. 

    By doing the trace we can observe if the problem happen because the proxy node doesn't send packet, or it's the packet was not received by the destination node or both. You can try to leave the sniffer very close to the proxy node. 


  • Hi Hung,

    Greetings!

    Please note my reply to your queries as below

    - Could you give more information about the range you achieved in the test ?
    For indoor trials: Distance between client device and server devices are about 15 feet and distance between to servers is around 15 feet.

    - What was the range achieved in the test outdoor ? Especially the test with 3 devices.
    For outdoor trials: Distance between client device and server devices are about 50 feet and distance between to servers is around 15 feet.

    - Have you tried to test the same with only the nRF52 DK (on both 2 servers) ? The reason for this is to avoid any RF issue on your hardware board.
    No we have not conducted trials using nRF52 DK.
    So considering this and wireshark filetrs as suggested by you we have taken trials on nRF52 DK.
    Details for this setup is as below

    Devices:
    a. Client Device: Nordic nRF Mesh Android App
    b. Server Device: nRF52 DK based modified Light Switch Server (2 nos), distance between this 2 devices is 30 feet
    c. Sniffer Device: nRF52 DK based

    Setup:
    a. As suggested by you, we have kept sniffer near to proxy device to which android app is connected
    b. 2 Servers are kept at different locations with distance between them is 30 feet.

    Observations:
    a. Then we have captured wireshark sniffer log and device RTT log for proxy node.
    b. We tried to read and understand wireshark capture and rtt log after timeout issue occurred while operating from android app.
    c. If we have read these logs correctly then it is something like below
    app send command to proxy node --> proxy node transmitted on mesh network --> proxy node did not receive ACK command
    d. Here we do not know whether end node has received command and responded to that command,
    so in another trial we tried to catch end node device rtt log.
    (sorry to day but it was not possible for us (rtt) log all devices at a time, so we have taken end node rtt log in separate trial).
    e. Here we have found that end node has transmitted ACK command to APP

    Conclusion:
    a. Timeout issue: As per our analysis it looks like that end node transmits ACK but it do not get received by proxy node.
    b. Data missing rate: On nRF52 DK based setup, it is observed that data missing rate is 1 in 50 operations and that too without re-transmit logic at android app side.

    Request you to please go through attached log file and guide us on the same.
    I hope these trials are sufficient for pointing out the possible issue and to provide fix.

    Attachements:
    TRIAL_1 to TRIAL_3: Wireshark captures and proxy node rtt log
    TRIAL_4: End node device rtt log

    Regards,

    Dinesh

    20200523_151300.zip

  • Hi Dinesh, 


    It's difficult for us to look at the log and figure out what exactly happened. The reason is that we don't know at which period the time out happened. 

    But from what you described I have some thoughts as follow: 

    - There is a chance that the proxy is not in listening mode all the time as it supposed to do. 

    - But the issue could also be the radio hardware you have on the board, as you described the range you are testing is only 15-30 feet and I assume it's the max range you can achieve, then it's too short. With a tuned antenna it can achieve at least 50feet. 

    - To make sure it's the ACK that didn't receive by the proxy, please clarify that when you see the timeout issue occurred on the phone, you do see that the destination node actually turn off/on as the command from the phone. It's just the phone didn't receive confirmation , but the actual command is executed (the light turn on/off as expected).

    The case has been dragging for quite some time. I would strongly suggest you to do the following test: 

    - Test using only nRF52 DK. I can see that you have at least 2 nRF52 DK. One as the server and one as the sniffer. Please use both of them as server for testing. And please use unmodified light switch server for testing. This test is very important. 

    - After test #1 is done (both indoor and our door, close range and long range) You can start testing with your hardware, and only use unmodified light switch server firmware for testing. 

    - After test #2, you can try to increase the TX_POWER of the node, if the power consumption is not a big issue. To change TX_Power, you can modify set_default_broadcast_configuration() to set it to RADIO_POWER_NRF_POS4DBM instead of RADIO_POWER_NRF_0DBM. This would increase the signal power and overcome the interference. But if it's the issue that the proxy doesn't listen, then we should see no change in performance. 

Related