This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Dropped packets

Test Setup:

SD 6.1.0

SDK 15.2.0

Peripheral:  nRF52840

Central: iOS 13.1

Central is connected to two peripherals, each peripheral is sending 50 pkt/s, each packet has 13 bytes of payload. Connection interval is 30ms. Over hundreds of hours of testing this setup works great and we see no packet loss, however we now have one device which fails *sometimes*. We have three recordings where one peripheral is normal but the other peripheral shows  this pattern:

pkt-1, pkt-2, DROP, pkt-4, pkt-5, DROP, pkt-7, pkt-8, ...

We detect the drops by checking a continuity counter (cc) in the packet on the central device. The cc is an 8-bit value which increments for each packet sent by the peripheral. On the central we check each packet to make sure the cc value is equal to the previous packet +1.  

So.. any thoughts how to debug this? 

My first thought was to check error counters in the Nordic SD but could not find any nack counters or retransmit counters. iOS gives you nothing. 

  • Not sure how I would change the LF clock tolerance. Is that a setting in the SD or would I need to replace the physical crystal?

  • In your project you should find a NRF_SDH_CLOCK_LF_ACCURACY value. This is typically used in nrf_sdh_enable_request() when calling sd_softdevice_enable(). You can set this value to something larger than the tolerance of the physical crystal you have in your design, by setting it larger you can compensate for a peer that may be slightly out of it's own tolerance spec. For instance if you have 20ppm in your design you likely have used NRF_CLOCK_LF_ACCURACY_20_PPM, however you may set it to NRF_CLOCK_LF_ACCURACY_100_PPM.

    Kenneth

  • Ahh, now I remember that setting. LF accuracy is indeed set to 20ppm. I assume there would be a power penalty if I increase the setting to 100ppm. Should I assume the clock accuracy of the iPhone is not published and not well known? I am no expert on these oscillators but I know they only cost $0.20 for a good one so it just seems unlikely Apple would use a cheaper part with worse accuracy.  

  • Not saying Apple have an issue, just trying to say that an lfclk out of of tolerance (either to due to hardware or timing issue in scheduler) would give the same symptoms "dropped packets".

    Best regards,

    Kenneth

  • This ticket seems relevant to my issue. I read at least 20 related tickets and some are contradictory, so please let me know if the referenced ticket is misleading in any way. I based my code changes on this ticket.

    I looked through all possible error codes for sd_ble_gatts_hvx() and the most likely error is NRF_ERROR_RESOURCES. I only send 13B packets at 50hz so fundamentally this is not a throughput issue. Instead I believe the central may have a problem keeping up and/or there is a radio issue. Those are events I cannot control, but I do need to design the system so it can handle a certain number of packet retransmissions causing delays causing packets to accumulate in the Tx queue. So I need a "good" sized queue and I need to make sure the SD is configured properly so it can drain the queue when the radio link is good. This means sending multiple packets per connection interval until the queue is drained. How can I make sure my configuration is correct. I have no visibility into the number of retransmissions or the queue size or the number of packets sent per connection interval. All I can do is configure the SD in a way that I think is optimal for my application. So to that end here is the configuration I plan to try. Please let me know if you see any way to improve on this:

    • nRF is acting as peripheral with just 1 BLE link to the central
    • Maximum notification packet size 49B, typical packet size is 13B
    • Notifications are sent at 50hz
    • LFCLK accuracy = 100ppm
    • GAP data length = 49+4
    • GAP event length = 100
    • GATT max MTU size = 49
    • Tx queue size is determined by the SD. Should I override this somehow?
    • Connection interval 30ms
    • Slave latency 0
    • Connection supervision timeout 3s
    • Connection event extension = enabled
    
    

    The last setting is unclear to me. If I do not explicitly enable connection event extension, does that mean the SD will only send 1 packet per connection interval?  And what is the downside (or tradeoff) when this setting is enabled? 

    The queue size chosen by the SD seems to be limited by the length of the connection interval. I would double that queue size if I could so the system could tolerate longer periods of time when packet retransmits are required.

    But whatever the queue size, I need to define the application behavior when the queue becomes full. Several tickets suggest waiting for the system to transmit one packet from the queue before trying to send any new packets. However I don't see how this changes anything. My application will just keep calling sd_ble_gatts_hvx() and simply count the errors. Implementing a busy flag just complicates the logic without any obvious benefit, or am I missing something. 

Related