This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Multiple nodes acknowledgement problem

Getting acknowledgement only from single node which is 1st connected

Parents
  • Hi Anaya, 

    Please be more specific when asking question. You will get the support faster. 

    Please provide us information how you setup the nodes, if they subscribe to anything, how you send the message. What kind of message it is. 

  • Good evening Sir,

    Ok Sir. I will explain the case :

    1. I have created a mesh network where  I have configured 2 nodes say A and B.

    2. on A node I have 2 on-off servers and set publication address 0x0001 and 0x0002. The lamp operates ok on hardware.

    3. on B node I have 2 on-off servers and set publication address 0x0003 and 0x0004. The lamp operates ok on hardware.

    4. I disconnect the network and again reconnect. Then Node A occurred 1st. I got connected to network.

    5.  I redirect to a new activity where I have given Lamp1, Lamp2, Lamp3, Lamp4 with on-off buttons.

    6. While click on on/off of Lamp1 and Lamp2 then lamp1 operates at hardware and also get onMeshMessageReceived acknowledgment on which I want to show on/off status.

       But when I operate Lamp3 or Lamp4, I didn't get onMeshMessageReceived acknowledgment.

    7. But if I connected to node B, then it works exact opposite of point 5.

       That is I got acknowledgment of Lamp4 and Lamp5 but not of Lamp1 and Lamp2 which are configured on node A.

    8. My requirement is need to receive every lamp acknowledgment when connected to any node in the network.

    Kindly give the solution.

  • Hello,

    For the point 2,

    When you test with the iOS phone do you see the same issue that it takes much longer time to send a command to remote node compare to the proxy node ? 

    • We again checked in nRF mesh app for iOS, we found the same problem. It doesn't show the timeout error, but the lamp on/off not works, its missing packets for the other nodes in the network except 1st node to which it's connected.
    • For the 1st connected node it works perfectly without the packet loss.

    Also we the same issue in android and iOS application that, for the other nodes in the network the problem is same for configuration messages such as binding app key, publication address etc.

    Please give the appropriate solution as soon as possible.

    Regards!

  • Hi Anaya,

    Here is the reply from your colleague Madhav that I quote:

    We are seeing this issue once in a 10 times in Android application.

    Also we have taken trials with iOS application where in we have operated for approximately 200 times and still there was no single occurrence of timeout error.

    So what have changed ? It seems to be much worst now. 

    What you can do is to test with our light switch example and check if the proxy node has the same issue. Also please try to test using our NRF52 DK just to align the hardware with what we can test here. 

    What you described is much worst than what we are observing here, we also see the issue only once in about 10-20 trial, not 6-8 times out of 10. There must be something else that affected this. 

    A possibility is that the connection interval on BLE connection to the phone was too small. If it's too small it will occupies the time domain reserved for mesh. Could you try capture a sniffer trace of the connection ? 
    Or do you have any long interrupt handler that can cause any affect on the mesh performance ? 


  • Hi Hung,

    Greetings!

    1. As per your suggestion we have tried sniffer trace and have taken few trials with 3 different devices.
    We have connected only 2 devices at a time and operated 2nd device by connecting app to first device.
    Out of these 3 devices 1 is nRF52DK and rest 2 are our devices.

    Also we have used nRF Mesh Android App for these trials.

    Please find attached Wireshark Capture Log files for your reference and guide us further.

    2. Also please note that we are not using any long interrupt handler in our device firmwares

    Regards,

    Anaya

    Wireshark_capture_files.zip

  • Hi Anaya, 

    The sniffer trace you provided didn't track any connection. 

    You need to select the advertiser before the connection. Inside the Interface menu, there is a dropdown list of all advertiser, you need to choose the one you want to sniff. After that you make the connection from the phone. Please follow the instruction in the sniffer manual. Most important thing is that you can see the connection established in the sniffer trace. In your current traces there are only advertising packet. 

    Do you see the same issue when you test with our stock light switch example (no modification) ? 

  • Hi Hung,

    Greetings!

    Thanks for your reply and suggestions.

    As per your suggestion we have taken trials with modified and stock applications (light switch server) and captured sniffer log for device to which android app is connected. (2 trials for each modified and stock application). Hope now we have given you the correct log files.

    Please note that we have experienced timeout log in both the situations.

    Request you to please suggest us further steps to resolve this issue.

    Regards,

    Anaya

    Wireshark_capture_files_20200518_165500.zip

Reply Children
  • Hi Anaya, 

    I just tried here with the stock example and found the same issue. Most likely it's the update on the app caused the issue. I will continue the test and let you know if we find the root cause.

  • Hi Anaya, 

    I have continued the test but the result was not consistent. So at a point I have 60% packet dropped but when I tested again now it worked like before about 5-10% drop. I suspecting the interference from many of our test devices in the building caused the issue. 
    I would suggest you to check that as well. Do you have any high traffic on the 2.4GHz domain ? like wifi and other network ? 

    In your test how many devices do you have ? Have you tried to test using NRF52 DK ? 

    We need to have a better way to debug this issue. I think it's the best to print out logging. 

    Please init logging like this in the light switch application:  

    __LOG_INIT(LOG_SRC_APP | LOG_SRC_FRIEND|LOG_SRC_BEARER|LOG_SRC_NETWORK, LOG_LEVEL_DBG1, LOG_CALLBACK_DEFAULT);
    
    

    We will stick with the light switch application for testing to make it easier to align the test on your side and our side. If you can test using a fresh copy of the SDK it would be nice. I assume you are testing on SDK v4.0 ? Please also consider testing on v4.1 as well. 

    In the RTT log you will be able to find logging from proxy.c and from network.c. 
    The proxy.c handle the connection to the phone. And network.c is the actual mesh ADV packet between nodes. 

    In my testing here, I can see that when there is a command from the phone (that got timeout) the proxy node has managed to forward it to the network layer and the network layer sent the packet. 

    But on the remote node (not the proxy node) I don't see it receives this TX packet, showing that the packet is dropped/corrupted, could be due to interference. 

    Please try to test with debug enabled and let me know what you find. Also please try to test on different environment, and with different phone, etc. 

  • Hi Hung,

    Greetings!

    I am colleague of Mrs. Anaya, working on nRF52 part.

    As suggested by you, we have to debug this issue in more better way so let me introduce myself in this communication.

    Please see my reply as below considering your above queries.

    1. We have Wi-Fi network in our company with 2.4GHz frequency (auto mode channel).

    2. Also please note that yesterday we had trials and have also experienced same thing which you had. At first trial we had good communication response but in next trial it was worst experience.

    Here let me understand that if Wi-Fi network is causing for data loss then what can we do to improve this?

    3. In our test setup we are using 2 devices under test. However in our lab there are more Bluetooth devices are active say about 8-10 devices which are being used by my colleagues (provisioned with different android devices)

    Here let me understand below point as you asked for number of devices.

    Is increase in number of devices lead to decrease communication response/performance?

    4. We have also test it on nRF52 DK and have also experienced this timeout error

    5. We will take trials with logging enabled as you suggested and will let you know the test results.

    6. At present we are using SDK 4.0.0 with modified light switch server application.

    I think it would be for us better to test using this application as we had more trials with this setup till now.

    7. Can you please explain bit more on what do you mean by different environment?

    So that it will help us to take trials precisely.

    8. One more point which I would like to share with you that. Yesterday we have trials with devices which do not go for flash write operations (to save device status).

    Here we have seen improvement in communication but it is not that worth to say that problem is resolved. Here improvement is about 10% only

    Thank you so much.

    Regards,

    Dinesh

  • Hi Hung,

    Greetings!

    Please note that we have taken trials as suggested by you (with SDK4.0.0).

    1. We have taken trials keeping Wi-Fi network On and Off
    2. We have enabled log as you told
    3. We are also experiencing that proxy node forwards command to end node but at end node it looks like packet is not received
    4. Also we have observed that decrypt status for network RX packet is '5' (NRF_ERROR_NOT_FOUND). You can see this log in screenshot as "Net RX decrypt status: 5"

    Please see attached screenshot(s) for your reference.

    Regards,

    Dinesh

  • Hi Dinesh, 

    It's not possible for us to debug based on the log screenshot. I would suggest you to check and confirm that the proxy node sent the packet but the destination node couldn't receive the packet. 

    Then the next step is to test normal mesh communication (instead of sending command from the phone, you send the mesh command directly from one node to another node, and check the failure rate of this method). 

    My suggestion on testing on different environment is to do the same test at a location with less Wifi (or other RF) traffic. Check for other bluetooth device in the building, if there is a device that keep transmitting all the time at low interval it can cause the problem. So it's important to test on a clean enviroment (try to test outside, in an open area for example) 


    Please try to test using more devices in the network. This will improve the redundancy and the packet drop rate will be lower. 

Related