Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

Supervision timeout following connection parameter updates on limited devices

We are using SDK 15.3.0 and S140 v6.1.1 SoftDevice. We are using an external LF crystal with tolerance of +/-20ppm, and are therefore using

#define NRF_SDH_CLOCK_LF_SRC 1
#define NRF_SDH_CLOCK_LF_ACCURACY 7
which correspond to LF_SRC_XTAL and LF_ACCURAY_20PPM.
The nRF52840 is configured as a GAP peripheral and GATT server. It connects to iOS or Android devices.
In our application, we use two sets of connection parameters.
  • Set 1:
    • Min CI: 15ms
    • Max CI: 30ms
    • SL: 0
    • Supervision Timeout: 6s
  • Set 2:
    • Mic CI: 30ms
    • Max CI: 45ms
    • SL: 30
    • Supervision Timeout:6s

When first connecting, parameter set 1 is used. Set 2 is requested 30 seconds after ATT communication stops. For example, a phone connects to the nRF52, they exchange ATT commands for 10 seconds, then 30 seconds later, set 2 is requested.

We tested this setup on at least a dozen phones and a dozen nRF52s and it worked great. We even run a test for hours that switches between set 1 and set 2 at random intervals. This works great.

Recently, with new testers involved, they have found that 6 seconds (the supervision timeout) after set 2 is accepted (it seems to be after the params are accepted and not requested because both the phone and the nRF52 log use of the new parameters), a disconnection occurs due to a supervision timeout. While we know that interference could cause this, for certain individuals, this is 100% reproducible. They have tested in multiple physical locations (different interference profiles) and this can be reproduced on a variety of iOS and Android devices and a variety of nRF52 units. However, we have multiple individuals with the same model of phone, and only one person reproduces the issue.

We tried lowering the slave latency for set 2, and it eventually resolves the issue, but the value at which the issue is resolved varies case-by-case. Sometimes SL=10 fixes the disconnects. Sometimes it needs to be as low as 5.

I also tried a special nRF52 build with NRF_SDH_CLOCK_LF_ACCURACY set to 500ppm. I know this is recommended (required by the SDK asserts) if using the internal RC clock source, but we're not. The value of 500ppm did indeed resolve a consistent disconnect with SL=30.

Overall, it seems like we might be experiencing a clock tolerance issue, but I don't know how to prove this. We test in climate-controlled environments and our nRF52 board does not generate much heat. Unless the manufacturer is provided out 

We've also already released our nRF52 FW and we are hoping to only make changes to the phone app (we have a way for the phone to request the nRF52 to use new connection parameters)

In summary, we have a small set of phones and nRF52s that consistently disconnect after a connection parameter update with a CI/SL combo over a few hundred milliseconds. Most devices do not have an issue. This is resolved by lowering the CI/SL duration, but the duration is variable. We're curious whether there is any known issue with the connection parameter update procedure or RX window widening. Does this situation ring any obvious bells?

Parents
  • Hi

    Just to make sure I understand correctly. Is this reproducible on some phones no matter what nRF52 device it's tested with? Or is it reproducible on some nRF52 devices no matter what phones they connect to? Or is it just with specific combinations of some nRF52 devices and phones? Can you specify what phone models you see this issue on, cheaper models tend to have clock drift and a relaxed relationship to the BLE specification, which can cause trouble unfortunately, for example by disconnecting because of two high of an accuracy.

    It could indeed be a problem with some of the LF crystals on your boards. They might have been damaged during soldering, have the wrong capacitor values mounted, or have a bad connection to the nRF52 for example. One way to check is to lower the NRF_SDH_CLOCK_LF_ACCURACY slightly, not all the way to 500ppm, often it's enough to set it to 50ppm for example, and you will still have good accuracy. If this fixes the issue, and accuracy is not critical in your application, that can be an okay fix.

    You can also try desoldering the LF crystal and soldering on a new one without changing anything else to see if it's a HW issue with the crystal itself.

    Best regards,

    Simon

Reply
  • Hi

    Just to make sure I understand correctly. Is this reproducible on some phones no matter what nRF52 device it's tested with? Or is it reproducible on some nRF52 devices no matter what phones they connect to? Or is it just with specific combinations of some nRF52 devices and phones? Can you specify what phone models you see this issue on, cheaper models tend to have clock drift and a relaxed relationship to the BLE specification, which can cause trouble unfortunately, for example by disconnecting because of two high of an accuracy.

    It could indeed be a problem with some of the LF crystals on your boards. They might have been damaged during soldering, have the wrong capacitor values mounted, or have a bad connection to the nRF52 for example. One way to check is to lower the NRF_SDH_CLOCK_LF_ACCURACY slightly, not all the way to 500ppm, often it's enough to set it to 50ppm for example, and you will still have good accuracy. If this fixes the issue, and accuracy is not critical in your application, that can be an okay fix.

    You can also try desoldering the LF crystal and soldering on a new one without changing anything else to see if it's a HW issue with the crystal itself.

    Best regards,

    Simon

Children
  • We've had a lot of activity in the past 24 hours regarding this issue. At first, we were struggling to identify which combination of phones and nRF52 devices would reproduce the issue, but now we finally got some traction.

    We determined that our nRF52 device is the problem. We can reproduce the issue with several phones on a specific nRF52 device, and the same phones with a different nRF52 device do not reproduce the issue.

    We've also determined that the issue is more likely to be reproduced when the phone's clock tolerance is narrower. By looking at the CONNECT_IND payload, a phone with a 31-50ppm tolerance has a higher failure rate than a phone with a 251-500ppm tolerance.

    We also did as you recommended and incrementally increased the NRF_SDH_CLOCK_LF_ACCURACY. With one device combination, we found 150PPM failed but 250PPM was successful.

    This suggests potentially significant issues with our crystal, either damage during manufacturing or a capacitance issue. We'll be assessing the power impact of 250PPM  NRF_SDH_CLOCK_LF_ACCURACY vs 20PPM, but it seems likely to be negligible. Apart from RX WINDOW WIDENING, is there any other major factor affected by the value of NRF_SDH_CLOCK_LF_ACCURACY ?

Related