Hello, we need some help debugging a proxy node in a BLE mesh network.
First some information about our experiment - we currently have 5 nodes running a custom fw on a nrf52832 with S132 soft device. These nodes are all provisioned to work in a mesh network. We have a custom Android App that connects to a proxy node . The app sends a message to all nodes using the 0xFFFF address. Inside this message there is a parameter that tells a node to reply with a message. When the desired node obtains the broadcast message it waits some ms before sending a message back to the phone. When the phone receives the message from the node it requested it from, it repeats this process. This interaction happens 10 to 20 times very fast.
The issue we're facing is that sometimes (maybe 1% of the time) the phone does not receive the message back.
To debug this problem we've set up an experiment where we don't use a phone (therefore not relying on the proxy model to relay the phone messages). We implemented a fw on one of the nodes that acts as if it was the phone, sending a message, waiting to receive the message and the sending another one. In this test there wasn't any problems. All the messages that were sent as broadcast singling out a node, were answered with a message from the corresponding node. After this test, we can confirm that the issue is not a packet lost in the mesh network.
So we went back to the first experiment with the phone joining the network using a proxy node. To debug this issue we're hooking up two nodes to the debugger, one of these is the proxy node. We're making both nodes print information of the messages they relay(Source, Destination, TID). What we're seeing with this experiment is that when a message response fails, the proxy says it relayed the broadcast message but neither the node that we're debugging or the node that the message was intended to, receive the message. What we believe is happening is that the proxy node is not relaying the message that comes from the phone in to the mesh network.
We've delved into the code and added multiple prints in core_tx.c, core_tx_adv.c, transport.c, advertiser.c, proxy.c, packet_buffer.c to try and see if when the message fails we can see why it failed, but we have no conclusive evidence of what's happening.
Our next experiment is hooking up a nrf52840 dongle as a sniffer to try and see if the packet is really being sent to the mesh network or not. So when we get done with that we'll share those result, but in the mean time, can you guys point us in the right direction to try and fix this? Thanks!
The setup is as follows:
- nRF5_SDK_15.3.0_59ac345
- nrf5_SDK_for_Mesh_v3.2.0
- s132_nrf52_6.1.1_softdevice