This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE connection stability

Hello,

I would appreciate some guidance about BLE connection stability. Our product is a system of 3 peripherals (custom board with Laird's BL652 module, containing nRF52832) all simultaneously connected to our iOS app (central). The peripheral application is written with Nordic SDK 15.3 plus S112 v6.1.1, based initially on the ble_app_uart example. Our iOS app uses Apple's Core Bluetooth framework and currently supports iOS 9.3 or newer.

Each peripheral has a sensor and uses GPIOTE, PPI and TIMERs to timestamp events and send to the connected central via NUS. Events occur randomly, sometimes with a minute or two between events, and sometimes multiple events occur in quick succession. The clocks (TIMER2) on the three peripherals are synchronized using the Timeslot API based on a simplified version of this Nordic blog. The timeslot length is 1000 microseconds.

For higher accuracy, our peripheral app requests the high frequency clock (sd_clock_hfclk_request()), which I understand sets the clock source to XTAL. The BL652 has an integrated high accuracy 32 MHz (±10 ppm) crystal oscillator.

This is a mobile system that is set up with each peripheral at most 150 feet apart (open space, line-of-sight). Each peripheral is powered by 2 AA batteries. For testing, we set up three systems to test concurrent use of multiple 3-peripheral systems connected to multiple iOS devices. Each of the three systems (each comprising of 3 peripherals) uses a different radio address for the time sync using the Timeslot API.

Connection parameters for the peripherals are:

#define MIN_CONN_INTERVAL MSEC_TO_UNITS(30, UNIT_1_25_MS)
#define MAX_CONN_INTERVAL MSEC_TO_UNITS(75, UNIT_1_25_MS)
#define SLAVE_LATENCY 0
#define CONN_SUP_TIMEOUT MSEC_TO_UNITS(4000, UNIT_10_MS)
#define FIRST_CONN_PARAMS_UPDATE_DELAY APP_TIMER_TICKS(5000)
#define NEXT_CONN_PARAMS_UPDATE_DELAY APP_TIMER_TICKS(30000)
#define MAX_CONN_PARAMS_UPDATE_COUNT 3

I understand these conform to Apple's requirements. We do NOT need high throughput and send only small amounts of data (<20 bytes) between central and peripherals. In sdk_config.h, NRF_SDH_BLE_GATT_MAX_MTU_SIZE is 23. In ble_evt_handler(), on BLE_GAP_EVT_CONNECTED, I set transmit power level to 4db:

err_code = sd_ble_gap_tx_power_set(BLE_GAP_TX_POWER_ROLE_CONN, m_conn_handle, 4);

Using a BLE sniffer, for iOS 13.5.1 on an iPhone SE (2020 model) we see the following CONNECT_REQ packet:

Bluetooth Low Energy Link Layer
    Access Address: 0x8e89bed6
    Packet Header: 0x22c5 (PDU Type: CONNECT_REQ, ChSel: #1, TxAdd: Random, RxAdd: Random)
    Initator Address: 5c:d1:b4:78:4e:43 (5c:d1:b4:78:4e:43)
    Advertising Address: d5:88:07:32:7a:ad (d5:88:07:32:7a:ad)
    Link Layer Data
        Access Address: 0x50654a99
        CRC Init: 0x28ce8e
        Window Size: 3 (3.75 msec)
        Window Offset: 7 (8.75 msec)
        Interval: 24 (30 msec)
        Latency: 0
        Timeout: 72 (720 msec)
        Channel Map: ff07c0ff1f
            .... ...1 = RF Channel 1 (2404 MHz - Data - 0): True
            .... ..1. = RF Channel 2 (2406 MHz - Data - 1): True
            .... .1.. = RF Channel 3 (2408 MHz - Data - 2): True
            .... 1... = RF Channel 4 (2410 MHz - Data - 3): True
            ...1 .... = RF Channel 5 (2412 MHz - Data - 4): True
            ..1. .... = RF Channel 6 (2414 MHz - Data - 5): True
            .1.. .... = RF Channel 7 (2416 MHz - Data - 6): True
            1... .... = RF Channel 8 (2418 MHz - Data - 7): True
            .... ...1 = RF Channel 9 (2420 MHz - Data - 8): True
            .... ..1. = RF Channel 10 (2422 MHz - Data - 9): True
            .... .1.. = RF Channel 11 (2424 MHz - Data - 10): True
            .... 0... = RF Channel 13 (2428 MHz - Data - 11): False
            ...0 .... = RF Channel 14 (2430 MHz - Data - 12): False
            ..0. .... = RF Channel 15 (2432 MHz - Data - 13): False
            .0.. .... = RF Channel 16 (2434 MHz - Data - 14): False
            0... .... = RF Channel 17 (2436 MHz - Data - 15): False
            .... ...0 = RF Channel 18 (2438 MHz - Data - 16): False
            .... ..0. = RF Channel 19 (2440 MHz - Data - 17): False
            .... .0.. = RF Channel 20 (2442 MHz - Data - 18): False
            .... 0... = RF Channel 21 (2444 MHz - Data - 19): False
            ...0 .... = RF Channel 22 (2446 MHz - Data - 20): False
            ..0. .... = RF Channel 23 (2448 MHz - Data - 21): False
            .1.. .... = RF Channel 24 (2450 MHz - Data - 22): True
            1... .... = RF Channel 25 (2452 MHz - Data - 23): True
            .... ...1 = RF Channel 26 (2454 MHz - Data - 24): True
            .... ..1. = RF Channel 27 (2456 MHz - Data - 25): True
            .... .1.. = RF Channel 28 (2458 MHz - Data - 26): True
            .... 1... = RF Channel 29 (2460 MHz - Data - 27): True
            ...1 .... = RF Channel 30 (2462 MHz - Data - 28): True
            ..1. .... = RF Channel 31 (2464 MHz - Data - 29): True
            .1.. .... = RF Channel 32 (2466 MHz - Data - 30): True
            1... .... = RF Channel 33 (2468 MHz - Data - 31): True
            .... ...1 = RF Channel 34 (2470 MHz - Data - 32): True
            .... ..1. = RF Channel 35 (2472 MHz - Data - 33): True
            .... .1.. = RF Channel 36 (2474 MHz - Data - 34): True
            .... 1... = RF Channel 37 (2476 MHz - Data - 35): True
            ...1 .... = RF Channel 38 (2478 MHz - Data - 36): True
            ..0. .... = RF Channel 0 (2402 MHz - Reserved for Advertising - 37): False
            .0.. .... = RF Channel 12 (2426 MHz - Reserved for Advertising - 38): False
            0... .... = RF Channel 39 (2480 MHz - Reserved for Advertising - 39): False
        ...0 1111 = Hop: 15
        001. .... = Sleep Clock Accuracy: 151 ppm to 250 ppm (1)
    CRC: 0x419071

So we have 9 peripherals, 3 connected to one iOS device, 3 connected to another iOS device, and 3 connected to yet another iOS device. I'm noticing random disconnects. On the iOS side, centralManager(_:didDisconnectPeripheral:error:) is reporting error 6 which is "The connection has timed out.".

Have we bumped into some of the practical limits of BLE? I did a sniffer trace that captured a random disconnect. See attached (Wireshark with nRF52 DK and nRF Sniffer 3.0.0). I notice a lot of LL_CHANNEL_MAP_REQ packets, but I don't have much knowledge of this level of BLE. Is there anything we can do to increase connection stability? We request a 4 second supervisor timeout but the central chooses 720 milliseconds. Use higher min and max connection interval? Our central app generally uses writeWithRespnse when writing characteristic values.

Appreciate any information, guidance.

Many thanks,

Tim

5238.sniffer trace.pcapng

  • Thanks Edvin,

    Today I confirmed that when random disconnects occur, it is not because the nRF is crashing and resetting. I logged the following reason codes: 0x08, 0x3E, 0x28 with 0x08 being the most common. With 9 peripherals connected to 3 centrals (3 peripherals to each central (iOS device)), the disconnects occurred often for one particular peripheral (once every minute or two). The 3 peripherals connected to a central are separated by maximum 140 feet in open space. The centrals (iOS devices) were sitting at most 100 feet from the farthest peripheral. I noticed that often as I walked up to the centrals, a disconnect would occur. It seemed that when the space had other people (8-10 people) present, disconnects occurred more frequently.

    After logging the disconnect reasons and capturing a few more sniffer traces, I turned off 2 of the 3 sets of 3 peripherals leaving 3 peripherals connected to 1 central. Disconnect rates went way down (only a couple disconnects over a 20-minute period). I walked throughout the space carrying the central device and no disconnects. Probably maximum 190 feet distance between central and any of the 3 peripherals.

    I've attached another sniffer trace with several disconnects captured.

    sniffer trace-3.pcapng

    I didn't have time to test using the internal RC. If still worthwhile after above information, I'll test today. We use 12pF caps on the external crystal. Here's the schematic:

    Here's a portion of the board layout:

    Thank you for your thoughts about this. If your sense now is that RF interference is the cause of the disconnects, any suggestions in how to mitigate this? Adjust connection parameters? Clock settings?

    Generally, what are the common sources of RF interference that would impact BLE connection stability?

    Many thanks,

    Tim

  • Ok, so the timeout reasons are:

    0x08: BLE_HCI_CONNECTION_TIMEOUT
    0x3E: BLE_HCI_CONN_FAILED_TO_BE_ESTABLISHED
    0x28: BLE_HCI_INSTANT_PASSED

    0x3E means that the devices failed to connect, which can also be caused by packet loss.

    0x28 means that some change that was supposed to happen (e.g. channel map update or connection parameter update) was not Acked until the instant they were supposed to take place.

    All of these can occur if you struggle with packet loss.

    Regarding the PCB layout. The capacitors C6 and C7 should have the value:

    C = 2*Cl - Cpin - Cpcb

    where Cl = capacitance of the LFXTAL (Y1), Cpin = the capacitance of the pin, which is 4pF on the nRF52832 and Cpcb is the capacitance on the PCB trace. Typically between 0 and 1pF.

    From the datasheet of your XTAL, it has 12.5pF, so in your case the capacitors should have: C = 2*12.5 - 4  - 0 = 21pF. Standard capacitors are either 20 or 22, so you should go with 20 pF Capacitors (allowing 1 pF in the PCB trace). Regarding the R10 (10MOhm) I am not sure. It is not present in the reference layout. I see it has a large resistance, but I don't know what effect it has. Do you know why it was added?

    As I can see you are not sending a lot of packets, and you are using 30ms connection interval. If you only have 3 connections that should be plenty of time.

    I did however notice that you are using 2MBPS. This is not great for longer ranges. I suggest that you set 1MBPS. Depending on what SDK you are using, this is set a bit differently.

    Try changing:

    case BLE_GAP_EVT_PHY_UPDATE_REQUEST:
            {
                NRF_LOG_DEBUG("PHY update request.");
                ble_gap_phys_t const phys =
                {
                    .rx_phys = BLE_GAP_PHY_AUTO,
                    .tx_phys = BLE_GAP_PHY_AUTO,
                };
                err_code = sd_ble_gap_phy_update(p_ble_evt->evt.gap_evt.conn_handle, &phys);
                APP_ERROR_CHECK(err_code);
            } break;
            
    //to:
    
    case BLE_GAP_EVT_PHY_UPDATE_REQUEST:
            {
                NRF_LOG_DEBUG("PHY update request.");
                ble_gap_phys_t const phys =
                {
                    .rx_phys = BLE_GAP_PHY_1MBPS,
                    .tx_phys = BLE_GAP_PHY_1MBPS,
                };
                err_code = sd_ble_gap_phy_update(p_ble_evt->evt.gap_evt.conn_handle, &phys);
                APP_ERROR_CHECK(err_code);
            } break;

    2MBPS has a shorter range than 1MBPS, and since you are not sending a lot of payload data, this may be better in your case, if you have long distances.

     

    Tim said:
    With 9 peripherals connected to 3 centrals (3 peripherals to each central (iOS device)), the disconnects occurred often for one particular peripheral (once every minute or two). The 3 peripherals connected to a central are separated by maximum 140 feet in open space. The centrals (iOS devices) were sitting at most 100 feet from the farthest peripheral. I noticed that often as I walked up to the centrals, a disconnect would occur. It seemed that when the space had other people (8-10 people) present, disconnects occurred more frequently.

     Reasons for packet loss:

    - Too long distance, so the signal strength is too low.

    - Obstances, such as walls, and people (The human body is mostly water, which absorbs the 1.4GHz radio signals)

    - Noise (a little bit from other BLE connections, but not much, as they are using different channels. Wifi is a causing more noise).

    Try changing to 1MBPS, and see if that helps. Are all the devices in the same room, or are there many walls in between the devices?

  • Thanks Edvin. I will try changing to 1MBPS. I presume I should also change to 1MBPS for the radio configuration of the time sync code that uses Timeslot API to sync clocks?

    Yes, all devices are in the same large room (160 x 120 square feet), no interior walls. For my current testing, all devices are near one wall.

    I’ll try 1MBPS next. Would it also be worthwhile to increase the minimum connection interval? Supervisor timeout? 3 devices max is typical, but sometimes there can be 6 or even 9.

    Could caps of too low value (12 pf instead of 20) cause the issue? We will change to 20. I’ll check with our engineer about R10.

    Many thanks Edvin,

    Tim

  • You can increase the supervision timeout, but consider what this really does. All it does is changing from a connected to a disconnected state after the supervision timeout has expired. If that works in your case, you may try this. But the reason for your timeouts is that the devices hasn't been able to communicate for 720ms. Setting the supervision timeout to e.g. 4 seconds will increase the time it takes to disconnect by timeout, but it still means that you loose packets. Disconnecting and reconnecting isn't really an issue in itself, but if it soothes your mind that the devices is connected at all times if you increase the supervision timeout, then go ahead. 

    NB: By default, all the examples in the SDK has a supervision timeout of 4 seconds.

     

    Tim said:
    3 devices max is typical, but sometimes there can be 6 or even 9.

     If you intend to have 9 connected devices, I would look into increasing the connection interval. One connection event takes a couple of ms, so 3 connections per 30ms is not a problem, but if the central spends over half the time on the radio supporting all the links, then you may encounter more packet loss, due to timeslot collisions. Please check out how long the central spends on one connection per connection interval using the online power profiler:

    https://devzone.nordicsemi.com/nordic/power/w/opp/2/online-power-profiler-for-ble

    Best regards,

    Edvin

  • Thanks Edvin. I tried using 1 MBPS and it reduced disconnect errors to zero. Sniffer trace (attached) shows very few dropped packets. This was with same test scenario: 9 peripherals, sets of 3 connected to 3 central devices. I understand 2 MBPS came with BLE 5.0. Only one of the central (iOS) devices supports BLE 5.0. The other two support 4.2. So in the previous test (many disconnect errors), 3 of 9 peripheral links were 2 MBPS and 6 of the 9 were 1 MBPS.

    sniffer trace-4.pcapng

    Is it guaranteed that ble_evt_handler(...) will be called with the BLE_GAP_EVT_PHY_UPDATE_REQUEST event? Should I also call sd_ble_gap_phy_update(...) when the BLE_GAP_EVT_CONNECTED event occurs? Can the central refuse to honour the sd_ble_gap_phy_update(...) request?

    The initial test (today) was with min/max connection interval set to 30/75ms (zero disconnect errors). I increased to 60/120 and same good results. I don't know how to choose optimal min/max connection intervals given use case. Data throughput is not important, and only small amounts of data (<18 bytes) are sent either direction. Any advantages, other than data throughput, of low connection intervals? BLE needs to play well with Timeslot API. I realize that connection parameters can be requested by the peripheral but the central decides which parameters to use. In the attached sniffer trace, the connection interval is 30ms, even though min/max connection interval were 60/120.

    Thanks for your explanation about supervision timeout. I understand.

    Many thanks,

    Tim

Related