How to reduce errors for frequent BLE GATT connections


I'm working on a Xamarin app for Android 11 that makes a lot of GATT connections, and looking for suggestions on the best way to improve reliability or reduce the frequency of errors.

A quick overview of what the app does:

- Run frequent scans to find potential GATT targets
- Scan for 12sec, stop for a short period, scan for 12sec, repeat...
- This is odd, but helps us avoid having the Android OS downgrade our scans to opportunistic

- On an interval, attempt up to 5 simultaneous GATT connections
- For each connection: discover services, get the service we want, read a characteristic, and finally disconnect & close the GATT client as quickly as we can

The overall flow works decently, except we seem to encounter a higher rate of errors than we'd expect, and end up having to retry connections.

------------------

The most frequent error we see is the Status code 133 (GATT_ERROR) during onConnectionStateChange.

We call ConnectGatt(context, false, gattCallback, BluetoothTransports.Le); to initiate the connection. MS Docs

If we see the 133 response, we immediately call Close() on the client, and wait ~1sec before trying a fresh connection to that device.

I understand this is a generic, catch-all error code, but our app seems to hit it far more often than nRF Connect. I've tried using nRF Sniffer for BLE to find any other details of what might be going wrong, but the only thing that stands out is a malformed SCAN_REQ packet:

9551 -42 dBm Apr 18, 2023 10:02:49.422582000 LE 1M LE LL 0 SCAN_REQ[Malformed Packet] 73:73:61:4d:01:79 0x3

0000 05 17 00 03 76 a3 02 0a 00 25 2a 00 00 22 2f 1e
0010 40 d6 be 89 8e c3 04 79 01 4d 61 73 73 48

I noticed this happened near the timeframe of 2 different status 133 responses that our app received, but SCAN_REQ doesn't seem like the right message type to affect GATT connections. The other packets within 2 seconds of the error just look like normal ADV_IND & SCAN_REQ/RSP.

I'm pretty new to using the BLE Sniffer tool, so any advice on how best to use it to dig deeper would be appreciated.

------------------

We also see issues with:
- Service discovery sometimes takes an excessive amount of time (we call Close() on the client after 6 seconds)
- Service discovery sometimes returns the wrong number of services (i.e. we get 9, but nRF Connect mobile shows 10)

Our process for service discovery is:
- After getting a connection established, we:
- Request connection priority of High
- wait 600ms
- call DiscoverServices()

Parents
  • The overall flow works decently, except we seem to encounter a higher rate of errors than we'd expect, and end up having to retry connections.

    Are you able to quantify this number? Are we talking a few % (e.g. <5%) or tens of % (e.g. >10%) attempts?

    Reason for asking is that packet loss is quite normal in the 2.4GHz band, and maybe in particular for connection attempts, since the connection request packet is long, and the connection request packet may for instance be sent at the same time as a second central is sending a scan request packet. These two packet will occur at the same time on-air, and very likely the actual advertiser will not be able to receive either successfully, and in which case the connection does not establish. Instead after 6 connection interval periods the central will give up and likely throw an error. I would expect placing the phone and advertiser very close together (for test), and move other phones further away will reduce the problem (if this is indeed the issue). Maybe you can give this a go, and also report the amount of attempts that fails. I would expect <5% to be quite normal, in a crowded environmental likely more.

    Kenneth

  • Thanks for the response.

    To quantify a bit based on a recent log file:

    • 80 GATT connection attempts
      • 27 GATT_ERROR (133) responses
      • 53 connections established (taking anywhere from 350ms - 2200ms)

    So, in that instance, a 33.75% failure rate. This is to various devices, so some would have taken multiple tries to connect, others may have connected first try.

    Anecdotally, even when a target device is very close (within 1ft) to the scanning device that initiates the connection attempt, we still see some errors.

    Given your description of packet interference, do you think we may be hurting ourselves by having a scan running while also trying to establish GATT connections?

  • Hello,

    The 133 usually happens for a reason, but it can be anything. Have you tried on a different phone including Pixels, or different os versions?

    In nRF Connect we do stop scanning before connecting, but that should not matter i think.

    Regarding scanning for 12 sec with short break - that's very power hungry. From Android 8 it's possible to scan with Pending Intent, which is battery efficient, but perhaps doesn't meet timing requirements.

    Maybe you can take HCI Snoop Logs from the phone to get logs, perhaps there's a reason of 133s?

    I don't consider this a specific Nordic issue, so I suggest to try with Android support/community.

    Best regards,
    Kenneth

  • Thanks - I'll try other communities as well. Figured I'd start here as there seems to be more BLE expertise than most places.

    Regarding power consumption, you're correct, our focus is on performance and timing.

    For the HCI snoop logs, would that be from the scanning device, or the target device?

    And do you think those HCI logs would be more useful than observing with the BLE Sniffer tool?

Reply
  • Thanks - I'll try other communities as well. Figured I'd start here as there seems to be more BLE expertise than most places.

    Regarding power consumption, you're correct, our focus is on performance and timing.

    For the HCI snoop logs, would that be from the scanning device, or the target device?

    And do you think those HCI logs would be more useful than observing with the BLE Sniffer tool?

Children
Related