This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE connection stability

Hello,

I would appreciate some guidance about BLE connection stability. Our product is a system of 3 peripherals (custom board with Laird's BL652 module, containing nRF52832) all simultaneously connected to our iOS app (central). The peripheral application is written with Nordic SDK 15.3 plus S112 v6.1.1, based initially on the ble_app_uart example. Our iOS app uses Apple's Core Bluetooth framework and currently supports iOS 9.3 or newer.

Each peripheral has a sensor and uses GPIOTE, PPI and TIMERs to timestamp events and send to the connected central via NUS. Events occur randomly, sometimes with a minute or two between events, and sometimes multiple events occur in quick succession. The clocks (TIMER2) on the three peripherals are synchronized using the Timeslot API based on a simplified version of this Nordic blog. The timeslot length is 1000 microseconds.

For higher accuracy, our peripheral app requests the high frequency clock (sd_clock_hfclk_request()), which I understand sets the clock source to XTAL. The BL652 has an integrated high accuracy 32 MHz (±10 ppm) crystal oscillator.

This is a mobile system that is set up with each peripheral at most 150 feet apart (open space, line-of-sight). Each peripheral is powered by 2 AA batteries. For testing, we set up three systems to test concurrent use of multiple 3-peripheral systems connected to multiple iOS devices. Each of the three systems (each comprising of 3 peripherals) uses a different radio address for the time sync using the Timeslot API.

Connection parameters for the peripherals are:

#define MIN_CONN_INTERVAL MSEC_TO_UNITS(30, UNIT_1_25_MS)
#define MAX_CONN_INTERVAL MSEC_TO_UNITS(75, UNIT_1_25_MS)
#define SLAVE_LATENCY 0
#define CONN_SUP_TIMEOUT MSEC_TO_UNITS(4000, UNIT_10_MS)
#define FIRST_CONN_PARAMS_UPDATE_DELAY APP_TIMER_TICKS(5000)
#define NEXT_CONN_PARAMS_UPDATE_DELAY APP_TIMER_TICKS(30000)
#define MAX_CONN_PARAMS_UPDATE_COUNT 3

I understand these conform to Apple's requirements. We do NOT need high throughput and send only small amounts of data (<20 bytes) between central and peripherals. In sdk_config.h, NRF_SDH_BLE_GATT_MAX_MTU_SIZE is 23. In ble_evt_handler(), on BLE_GAP_EVT_CONNECTED, I set transmit power level to 4db:

err_code = sd_ble_gap_tx_power_set(BLE_GAP_TX_POWER_ROLE_CONN, m_conn_handle, 4);

Using a BLE sniffer, for iOS 13.5.1 on an iPhone SE (2020 model) we see the following CONNECT_REQ packet:

So we have 9 peripherals, 3 connected to one iOS device, 3 connected to another iOS device, and 3 connected to yet another iOS device. I'm noticing random disconnects. On the iOS side, centralManager(_:didDisconnectPeripheral:error:) is reporting error 6 which is "The connection has timed out.".

Have we bumped into some of the practical limits of BLE? I did a sniffer trace that captured a random disconnect. See attached (Wireshark with nRF52 DK and nRF Sniffer 3.0.0). I notice a lot of LL_CHANNEL_MAP_REQ packets, but I don't have much knowledge of this level of BLE. Is there anything we can do to increase connection stability? We request a 4 second supervisor timeout but the central chooses 720 milliseconds. Use higher min and max connection interval? Our central app generally uses writeWithRespnse when writing characteristic values.

Appreciate any information, guidance.

Many thanks,

Tim

5238.sniffer trace.pcapng

Parents
  • Hello Tim,

    I have looked at the sniffer log that you provided, and it looks like the nRF (the peripheral/slave) is not responding, and hence the disconnect, as you probably know.

    What does the nRF behave like in this case? Have you tried to debug the application there? Do you get the disconnected event on the nRF as well, or does it for some reason reset? If you receive the disconnect event, what reason is it pointing to?

    You can find the disconnect reason in the BLE_GAP_EVT_DISCONNECTED event by using:

    What does it say?

    Best regards,

    Edvin

Reply
  • Hello Tim,

    I have looked at the sniffer log that you provided, and it looks like the nRF (the peripheral/slave) is not responding, and hence the disconnect, as you probably know.

    What does the nRF behave like in this case? Have you tried to debug the application there? Do you get the disconnected event on the nRF as well, or does it for some reason reset? If you receive the disconnect event, what reason is it pointing to?

    You can find the disconnect reason in the BLE_GAP_EVT_DISCONNECTED event by using:

    What does it say?

    Best regards,

    Edvin

Children
  • Thanks Edvin. It will take some time to set up a debug environment in the field, but I can do that and find out if the BLE_GAP_EVT_DISCONNECTED event occurs on the peripheral and what the disconnect reason is.

    In the meantime, I would be glad for any general guidance on creating a most stable link connection. Given that data throughput is not important, and that only small amounts of data are written (in either direction), what would be suggested connection parameters? Any other helpful considerations?

    In reading the Timeslot API documentation, I understand that using it can affect performance of the SoftDevice, so I wonder if it is a factor in what I am experiencing.

    I've attached another sniffer trace. It appears the disconnect occurs around time 103.733 as the peripheral/slave begins advertising again at 104.661. It seems that throughout duration of the connection, occasional Empty PDU packets from the slave are not seen by the sniffer. For example, at 15.408, 15.497, 15.588, etc. Prior to disconnect at 103.733, dropped slave Empty PDU packets become more frequent. At 98.002 there are 3 missing slave Empty PDU packets. At 100.852 there are 8 dropped packets, then more and more dropped packets until disconnect around 103.733. Would this suggest an interference issue? Maybe as clocks of other nearby peripherals drift, and at some point the interference becomes intolerable? Again, I have very little experience here, so maybe unhelpful speculation.

    Many thanks,

    Tim

    sniffer trace-2.pcapng

  • Hello Tim,

     

    Tim said:
    occasional Empty PDU packets from the slave are not seen by the sniffer. For example, at 15.408, 15.497, 15.588, etc. Prior to disconnect at 103.733

     Yes. Those are "normal". Since it is an on air link, you will occasionally have some dropped packets. When these are missing from the sniffer trace it means one of three things:

    1: The sniffer didn't pick up the packet, but the central did.

    2: The sniffer and the central didn't pick up the packet.

    3: The peripheral didn't pick up the previous packet from the central correctly (bad CRC), which means it doesn't have anything to reply to.

    In the case of packet nr 2001 and 2002 (15.408) I see that the NESN (next expected serial number) in the central's packet doesn't change, which indicates that the central didn't pick up any packets from the peripheral in between these two:

    However, if you have a large distance between the two devices, you can expect some packet drops occasionally.

    The increased number of dropped packets is probably caused by the environment. If something comes between the devices, a lot of radio noise in the area, a bad channel (much noise) or similar can cause this. Since it does reply after a while (even with 8 packets) that indicates that the nRF is still running as expected. However, when you have the disconnect you have 24 missed packets. It can either mean that the nRF's application has "crashed" and reset, or that the radio conditions are bad (noise). This is why I asked for the debug information on the nRF side. If you get the disconnected event, it probably means that the radio conditions are bad from time to time. If you don't get this event, you are probably seeing a reset caused by an application bug (most likely). 

     

    Tim said:
    Maybe as clocks of other nearby peripherals drift, and at some point the interference becomes intolerable?

     That shouldn't be an issue. They sync up every time they receive a packet. When you miss more packets the window that the peripheral listens will increase proportionally with the number of concecutive missed packets. You should however make sure you have the correct XTAL accuracy in the sdk_config.h file.

  • Thanks Edvin. I was out testing today and was unable to log the intermittent disconnect. It happened only once on a peripheral I was not debug logging. I'll continue to confirm whether nRF gets the disconnect event.

    As I mentioned earlier, our application uses a simplified version of the wireless clock synchronization code described here. For each system of 3 peripherals connected to 1 central (iOS device), one of the peripherals constantly transmits time beacons at a constant frequency for a specific PREFIX0 and BASE0. Every 30 seconds, the other two peripherals listen for beacons (same PREFIX0 and BASE0) and once a beacon is received they stop listening and set a timer to start listening again in 30 seconds. Initially the transmit frequency was 50Hz and I was noticing that receipt of beacons at the listening peripherals could take up to 8 or 10 seconds. I increased the transmit frequency to 100Hz and the time to receive a beacon once listening started went down to 1 second or less. However, after increasing the transmit frequency to 100Hz, I noticed an unexpected disconnect of one of the peripherals. At 50Hz, the BLE connections appeared stable. The documentation for the Timeslot API (used by the timer sync code) states that its use can influence the performance of the SoftDevice. Makes me wonder if Timeslot API use is the cause of unexpected disconnects.

    Would increasing the min and max connection interval and supervisor timeout help with stability of the BLE connection?

    About clock source and accuracy, sdk_config.h has the following:

    Our custom board has this ±5ppm external crystal connected to XL1/XL2. Should I be setting NRF_SDH_CLOCK_LF_ACCURACY to 9?

    Thanks Edvin.

    Tim

  • Hello Tim,

    Tim said:
    As I mentioned earlier, our application uses a simplified version of the wireless clock synchronization code described here.

     That should not affect the timekeeping for the softdevice. These are completely separate.

    The accuracy of the XTAL is dependent on both the XTAL itself and the PCB layout (more precisely the capacitors between the LFXTAL and the nRF). I see that your LFXTAL has 12.5pF capacitance. What are the value of the caps between the XTAL and the nRF?

    However, I am not convinced that this is an XTAL issue. 

     

    Tim said:
    Should I be setting NRF_SDH_CLOCK_LF_ACCURACY to 9?

     If you want to check whether it is an XTAL issue, please try to set:

    NRF_SDH_CLOCK_LF_SRC 0
    NRF_SDH_CLOCK_LF_ACCURACY 1
    NRF_SDH_CLOCK_LF_RC_CTIV 16
    NRF_SDH_CLOCK_LF_RC_TEMP_CTIV 2

    This will enable the internal RC oscillator instead of the external XTAL. Since we know this should work, try to look for the disconnects with these settings.

    But first, I would try to look for the disconnect reason with your current settings.

    BR,

    Edvin

  • Thanks Edvin,

    Today I confirmed that when random disconnects occur, it is not because the nRF is crashing and resetting. I logged the following reason codes: 0x08, 0x3E, 0x28 with 0x08 being the most common. With 9 peripherals connected to 3 centrals (3 peripherals to each central (iOS device)), the disconnects occurred often for one particular peripheral (once every minute or two). The 3 peripherals connected to a central are separated by maximum 140 feet in open space. The centrals (iOS devices) were sitting at most 100 feet from the farthest peripheral. I noticed that often as I walked up to the centrals, a disconnect would occur. It seemed that when the space had other people (8-10 people) present, disconnects occurred more frequently.

    After logging the disconnect reasons and capturing a few more sniffer traces, I turned off 2 of the 3 sets of 3 peripherals leaving 3 peripherals connected to 1 central. Disconnect rates went way down (only a couple disconnects over a 20-minute period). I walked throughout the space carrying the central device and no disconnects. Probably maximum 190 feet distance between central and any of the 3 peripherals.

    I've attached another sniffer trace with several disconnects captured.

    sniffer trace-3.pcapng

    I didn't have time to test using the internal RC. If still worthwhile after above information, I'll test today. We use 12pF caps on the external crystal. Here's the schematic:

    Here's a portion of the board layout:

    Thank you for your thoughts about this. If your sense now is that RF interference is the cause of the disconnects, any suggestions in how to mitigate this? Adjust connection parameters? Clock settings?

    Generally, what are the common sources of RF interference that would impact BLE connection stability?

    Many thanks,

    Tim