Central: iOS 13.1
Central is connected to two peripherals, each peripheral is sending 50 pkt/s, each packet has 13 bytes of payload. Connection interval is 30ms. Over hundreds of hours of testing this setup works great and we see no packet loss, however we now have one device which fails *sometimes*. We have three recordings where one peripheral is normal but the other peripheral shows this pattern:
pkt-1, pkt-2, DROP, pkt-4, pkt-5, DROP, pkt-7, pkt-8, ...
We detect the drops by checking a continuity counter (cc) in the packet on the central device. The cc is an 8-bit value which increments for each packet sent by the peripheral. On the central we check each packet to make sure the cc value is equal to the previous packet +1.
So.. any thoughts how to debug this?
My first thought was to check error counters in the Nordic SD but could not find any nack counters or retransmit counters. iOS gives you nothing.
I should have clarified I was sending notifications with sd_ble_gatts_hvx() so your guess was spot on.
The problem was never seen in ~9mo of development. We only see this in one customer device so we will try to get that device and see if we can dupe the problem in the lab. My hunch is that iOS is nack'ing enough to cause an overflow in the Tx buffer, although I don't understand why iOS would consistently send nacks for one peripheral and not the other when they are both sending notifications at the same rate. The nRF application will just ignore the error and proceed with the next notification. I can add some error counters and capture the return code from sd_ble_gatts_hvx() and find a way to send that information over to the central so it can be logged.
Should I be looking closer at the BLE connection parameters? I know the peripheral must negotiate with central to get the preferred connection parameters, but how would I know if the Central is not cooperating?
If this is with an iOS device, then check out:https://developer.apple.com/accessories/Accessory-Design-Guidelines.pdf
Also in general, it may be worth asking the customer to install the latest iOS version (even Beta), they are fixing issues constantly, and I have seen that Beta have solved issues in the past. Though, a sniffer log would be very helpful to prove one way or the other.
What lfclk source and tolerance have you configured here?
#define NRFX_CLOCK_CONFIG_LF_SRC 1
LF clock tolerance is +- 20ppm
Yes iOS 13 is a bugger of a release which gives us an easy out with the customer in the short term.
I have seen in the past that sometimes the peer device have an lfclk which exceed their reported tolerance, to compensate for this it is possible to set the tolerance of the local clock to be a bit relaxed. So it may be worth trying +- 100ppm just to see if that may be help here (though I doubt).