This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE connection stability

Hello,

I would appreciate some guidance about BLE connection stability. Our product is a system of 3 peripherals (custom board with Laird's BL652 module, containing nRF52832) all simultaneously connected to our iOS app (central). The peripheral application is written with Nordic SDK 15.3 plus S112 v6.1.1, based initially on the ble_app_uart example. Our iOS app uses Apple's Core Bluetooth framework and currently supports iOS 9.3 or newer.

Each peripheral has a sensor and uses GPIOTE, PPI and TIMERs to timestamp events and send to the connected central via NUS. Events occur randomly, sometimes with a minute or two between events, and sometimes multiple events occur in quick succession. The clocks (TIMER2) on the three peripherals are synchronized using the Timeslot API based on a simplified version of this Nordic blog. The timeslot length is 1000 microseconds.

For higher accuracy, our peripheral app requests the high frequency clock (sd_clock_hfclk_request()), which I understand sets the clock source to XTAL. The BL652 has an integrated high accuracy 32 MHz (±10 ppm) crystal oscillator.

This is a mobile system that is set up with each peripheral at most 150 feet apart (open space, line-of-sight). Each peripheral is powered by 2 AA batteries. For testing, we set up three systems to test concurrent use of multiple 3-peripheral systems connected to multiple iOS devices. Each of the three systems (each comprising of 3 peripherals) uses a different radio address for the time sync using the Timeslot API.

Connection parameters for the peripherals are:

#define MIN_CONN_INTERVAL MSEC_TO_UNITS(30, UNIT_1_25_MS)
#define MAX_CONN_INTERVAL MSEC_TO_UNITS(75, UNIT_1_25_MS)
#define SLAVE_LATENCY 0
#define CONN_SUP_TIMEOUT MSEC_TO_UNITS(4000, UNIT_10_MS)
#define FIRST_CONN_PARAMS_UPDATE_DELAY APP_TIMER_TICKS(5000)
#define NEXT_CONN_PARAMS_UPDATE_DELAY APP_TIMER_TICKS(30000)
#define MAX_CONN_PARAMS_UPDATE_COUNT 3

I understand these conform to Apple's requirements. We do NOT need high throughput and send only small amounts of data (<20 bytes) between central and peripherals. In sdk_config.h, NRF_SDH_BLE_GATT_MAX_MTU_SIZE is 23. In ble_evt_handler(), on BLE_GAP_EVT_CONNECTED, I set transmit power level to 4db:

err_code = sd_ble_gap_tx_power_set(BLE_GAP_TX_POWER_ROLE_CONN, m_conn_handle, 4);

Using a BLE sniffer, for iOS 13.5.1 on an iPhone SE (2020 model) we see the following CONNECT_REQ packet:

Bluetooth Low Energy Link Layer
    Access Address: 0x8e89bed6
    Packet Header: 0x22c5 (PDU Type: CONNECT_REQ, ChSel: #1, TxAdd: Random, RxAdd: Random)
    Initator Address: 5c:d1:b4:78:4e:43 (5c:d1:b4:78:4e:43)
    Advertising Address: d5:88:07:32:7a:ad (d5:88:07:32:7a:ad)
    Link Layer Data
        Access Address: 0x50654a99
        CRC Init: 0x28ce8e
        Window Size: 3 (3.75 msec)
        Window Offset: 7 (8.75 msec)
        Interval: 24 (30 msec)
        Latency: 0
        Timeout: 72 (720 msec)
        Channel Map: ff07c0ff1f
            .... ...1 = RF Channel 1 (2404 MHz - Data - 0): True
            .... ..1. = RF Channel 2 (2406 MHz - Data - 1): True
            .... .1.. = RF Channel 3 (2408 MHz - Data - 2): True
            .... 1... = RF Channel 4 (2410 MHz - Data - 3): True
            ...1 .... = RF Channel 5 (2412 MHz - Data - 4): True
            ..1. .... = RF Channel 6 (2414 MHz - Data - 5): True
            .1.. .... = RF Channel 7 (2416 MHz - Data - 6): True
            1... .... = RF Channel 8 (2418 MHz - Data - 7): True
            .... ...1 = RF Channel 9 (2420 MHz - Data - 8): True
            .... ..1. = RF Channel 10 (2422 MHz - Data - 9): True
            .... .1.. = RF Channel 11 (2424 MHz - Data - 10): True
            .... 0... = RF Channel 13 (2428 MHz - Data - 11): False
            ...0 .... = RF Channel 14 (2430 MHz - Data - 12): False
            ..0. .... = RF Channel 15 (2432 MHz - Data - 13): False
            .0.. .... = RF Channel 16 (2434 MHz - Data - 14): False
            0... .... = RF Channel 17 (2436 MHz - Data - 15): False
            .... ...0 = RF Channel 18 (2438 MHz - Data - 16): False
            .... ..0. = RF Channel 19 (2440 MHz - Data - 17): False
            .... .0.. = RF Channel 20 (2442 MHz - Data - 18): False
            .... 0... = RF Channel 21 (2444 MHz - Data - 19): False
            ...0 .... = RF Channel 22 (2446 MHz - Data - 20): False
            ..0. .... = RF Channel 23 (2448 MHz - Data - 21): False
            .1.. .... = RF Channel 24 (2450 MHz - Data - 22): True
            1... .... = RF Channel 25 (2452 MHz - Data - 23): True
            .... ...1 = RF Channel 26 (2454 MHz - Data - 24): True
            .... ..1. = RF Channel 27 (2456 MHz - Data - 25): True
            .... .1.. = RF Channel 28 (2458 MHz - Data - 26): True
            .... 1... = RF Channel 29 (2460 MHz - Data - 27): True
            ...1 .... = RF Channel 30 (2462 MHz - Data - 28): True
            ..1. .... = RF Channel 31 (2464 MHz - Data - 29): True
            .1.. .... = RF Channel 32 (2466 MHz - Data - 30): True
            1... .... = RF Channel 33 (2468 MHz - Data - 31): True
            .... ...1 = RF Channel 34 (2470 MHz - Data - 32): True
            .... ..1. = RF Channel 35 (2472 MHz - Data - 33): True
            .... .1.. = RF Channel 36 (2474 MHz - Data - 34): True
            .... 1... = RF Channel 37 (2476 MHz - Data - 35): True
            ...1 .... = RF Channel 38 (2478 MHz - Data - 36): True
            ..0. .... = RF Channel 0 (2402 MHz - Reserved for Advertising - 37): False
            .0.. .... = RF Channel 12 (2426 MHz - Reserved for Advertising - 38): False
            0... .... = RF Channel 39 (2480 MHz - Reserved for Advertising - 39): False
        ...0 1111 = Hop: 15
        001. .... = Sleep Clock Accuracy: 151 ppm to 250 ppm (1)
    CRC: 0x419071

So we have 9 peripherals, 3 connected to one iOS device, 3 connected to another iOS device, and 3 connected to yet another iOS device. I'm noticing random disconnects. On the iOS side, centralManager(_:didDisconnectPeripheral:error:) is reporting error 6 which is "The connection has timed out.".

Have we bumped into some of the practical limits of BLE? I did a sniffer trace that captured a random disconnect. See attached (Wireshark with nRF52 DK and nRF Sniffer 3.0.0). I notice a lot of LL_CHANNEL_MAP_REQ packets, but I don't have much knowledge of this level of BLE. Is there anything we can do to increase connection stability? We request a 4 second supervisor timeout but the central chooses 720 milliseconds. Use higher min and max connection interval? Our central app generally uses writeWithRespnse when writing characteristic values.

Appreciate any information, guidance.

Many thanks,

Tim

5238.sniffer trace.pcapng

  • Hello Tim,

    I have looked at the sniffer log that you provided, and it looks like the nRF (the peripheral/slave) is not responding, and hence the disconnect, as you probably know.

    What does the nRF behave like in this case? Have you tried to debug the application there? Do you get the disconnected event on the nRF as well, or does it for some reason reset? If you receive the disconnect event, what reason is it pointing to?

    You can find the disconnect reason in the BLE_GAP_EVT_DISCONNECTED event by using:

    NRF_LOG_INFO("Disconnected, reason: 0x%02x", p_ble_evt->evt.gap_evt.params.disconnected.reason);

    What does it say?

    Best regards,

    Edvin

  • Thanks Edvin. It will take some time to set up a debug environment in the field, but I can do that and find out if the BLE_GAP_EVT_DISCONNECTED event occurs on the peripheral and what the disconnect reason is.

    In the meantime, I would be glad for any general guidance on creating a most stable link connection. Given that data throughput is not important, and that only small amounts of data are written (in either direction), what would be suggested connection parameters? Any other helpful considerations?

    In reading the Timeslot API documentation, I understand that using it can affect performance of the SoftDevice, so I wonder if it is a factor in what I am experiencing.

    I've attached another sniffer trace. It appears the disconnect occurs around time 103.733 as the peripheral/slave begins advertising again at 104.661. It seems that throughout duration of the connection, occasional Empty PDU packets from the slave are not seen by the sniffer. For example, at 15.408, 15.497, 15.588, etc. Prior to disconnect at 103.733, dropped slave Empty PDU packets become more frequent. At 98.002 there are 3 missing slave Empty PDU packets. At 100.852 there are 8 dropped packets, then more and more dropped packets until disconnect around 103.733. Would this suggest an interference issue? Maybe as clocks of other nearby peripherals drift, and at some point the interference becomes intolerable? Again, I have very little experience here, so maybe unhelpful speculation.

    Many thanks,

    Tim

    sniffer trace-2.pcapng

  • Hello Tim,

     

    Tim said:
    occasional Empty PDU packets from the slave are not seen by the sniffer. For example, at 15.408, 15.497, 15.588, etc. Prior to disconnect at 103.733

     Yes. Those are "normal". Since it is an on air link, you will occasionally have some dropped packets. When these are missing from the sniffer trace it means one of three things:

    1: The sniffer didn't pick up the packet, but the central did.

    2: The sniffer and the central didn't pick up the packet.

    3: The peripheral didn't pick up the previous packet from the central correctly (bad CRC), which means it doesn't have anything to reply to.

    In the case of packet nr 2001 and 2002 (15.408) I see that the NESN (next expected serial number) in the central's packet doesn't change, which indicates that the central didn't pick up any packets from the peripheral in between these two:

    However, if you have a large distance between the two devices, you can expect some packet drops occasionally.

    The increased number of dropped packets is probably caused by the environment. If something comes between the devices, a lot of radio noise in the area, a bad channel (much noise) or similar can cause this. Since it does reply after a while (even with 8 packets) that indicates that the nRF is still running as expected. However, when you have the disconnect you have 24 missed packets. It can either mean that the nRF's application has "crashed" and reset, or that the radio conditions are bad (noise). This is why I asked for the debug information on the nRF side. If you get the disconnected event, it probably means that the radio conditions are bad from time to time. If you don't get this event, you are probably seeing a reset caused by an application bug (most likely). 

     

    Tim said:
    Maybe as clocks of other nearby peripherals drift, and at some point the interference becomes intolerable?

     That shouldn't be an issue. They sync up every time they receive a packet. When you miss more packets the window that the peripheral listens will increase proportionally with the number of concecutive missed packets. You should however make sure you have the correct XTAL accuracy in the sdk_config.h file.

  • Thanks Edvin. I was out testing today and was unable to log the intermittent disconnect. It happened only once on a peripheral I was not debug logging. I'll continue to confirm whether nRF gets the disconnect event.

    As I mentioned earlier, our application uses a simplified version of the wireless clock synchronization code described here. For each system of 3 peripherals connected to 1 central (iOS device), one of the peripherals constantly transmits time beacons at a constant frequency for a specific PREFIX0 and BASE0. Every 30 seconds, the other two peripherals listen for beacons (same PREFIX0 and BASE0) and once a beacon is received they stop listening and set a timer to start listening again in 30 seconds. Initially the transmit frequency was 50Hz and I was noticing that receipt of beacons at the listening peripherals could take up to 8 or 10 seconds. I increased the transmit frequency to 100Hz and the time to receive a beacon once listening started went down to 1 second or less. However, after increasing the transmit frequency to 100Hz, I noticed an unexpected disconnect of one of the peripherals. At 50Hz, the BLE connections appeared stable. The documentation for the Timeslot API (used by the timer sync code) states that its use can influence the performance of the SoftDevice. Makes me wonder if Timeslot API use is the cause of unexpected disconnects.

    Would increasing the min and max connection interval and supervisor timeout help with stability of the BLE connection?

    About clock source and accuracy, sdk_config.h has the following:

    //  NRFX_CLOCK_CONFIG_LF_SRC  - LF Clock Source
    // <0=> RC 
    // <1=> XTAL 
    // <2=> Synth 
    // <131073=> External Low Swing 
    // <196609=> External Full Swing 
    
    #ifndef NRFX_CLOCK_CONFIG_LF_SRC
    #define NRFX_CLOCK_CONFIG_LF_SRC 1
    #endif
    
    // NRF_SDH_CLOCK_LF_SRC - SoftDevice clock source.
    // <0=> NRF_CLOCK_LF_SRC_RC 
    // <1=> NRF_CLOCK_LF_SRC_XTAL 
    // <2=> NRF_CLOCK_LF_SRC_SYNTH 
    
    #ifndef NRF_SDH_CLOCK_LF_SRC
    #define NRF_SDH_CLOCK_LF_SRC 1
    #endif
    
    //  NRF_SDH_CLOCK_LF_ACCURACY  - External clock accuracy used in the LL to compute timing.
    // <0=> NRF_CLOCK_LF_ACCURACY_250_PPM 
    // <1=> NRF_CLOCK_LF_ACCURACY_500_PPM 
    // <2=> NRF_CLOCK_LF_ACCURACY_150_PPM 
    // <3=> NRF_CLOCK_LF_ACCURACY_100_PPM 
    // <4=> NRF_CLOCK_LF_ACCURACY_75_PPM 
    // <5=> NRF_CLOCK_LF_ACCURACY_50_PPM 
    // <6=> NRF_CLOCK_LF_ACCURACY_30_PPM 
    // <7=> NRF_CLOCK_LF_ACCURACY_20_PPM 
    // <8=> NRF_CLOCK_LF_ACCURACY_10_PPM 
    // <9=> NRF_CLOCK_LF_ACCURACY_5_PPM 
    // <10=> NRF_CLOCK_LF_ACCURACY_2_PPM 
    // <11=> NRF_CLOCK_LF_ACCURACY_1_PPM
    
    #ifndef NRF_SDH_CLOCK_LF_ACCURACY
    #define NRF_SDH_CLOCK_LF_ACCURACY 7
    #endif

    Our custom board has this ±5ppm external crystal connected to XL1/XL2. Should I be setting NRF_SDH_CLOCK_LF_ACCURACY to 9?

    Thanks Edvin.

    Tim

  • Hello Tim,

    Tim said:
    As I mentioned earlier, our application uses a simplified version of the wireless clock synchronization code described here.

     That should not affect the timekeeping for the softdevice. These are completely separate.

    The accuracy of the XTAL is dependent on both the XTAL itself and the PCB layout (more precisely the capacitors between the LFXTAL and the nRF). I see that your LFXTAL has 12.5pF capacitance. What are the value of the caps between the XTAL and the nRF?

    However, I am not convinced that this is an XTAL issue. 

     

    Tim said:
    Should I be setting NRF_SDH_CLOCK_LF_ACCURACY to 9?

     If you want to check whether it is an XTAL issue, please try to set:

    NRF_SDH_CLOCK_LF_SRC 0
    NRF_SDH_CLOCK_LF_ACCURACY 1
    NRF_SDH_CLOCK_LF_RC_CTIV 16
    NRF_SDH_CLOCK_LF_RC_TEMP_CTIV 2

    This will enable the internal RC oscillator instead of the external XTAL. Since we know this should work, try to look for the disconnects with these settings.

    But first, I would try to look for the disconnect reason with your current settings.

    BR,

    Edvin

Related