Ble pairing with linux - zero distribution flags

Hi!

We are having problems pairing a nRF52840 against linux (ubuntu 25.04). Pairing fails instantly, before any passkey is shown on the host, returning with code 9.

Short log is here:

<dbg> bt_smp: bt_smp_recv: Received SMP code 0x01 len 6
<dbg> bt_smp: smp_pairing_req: req: io_capability 0x04, oob_flag 0x00, auth_req 0x2D, max_key_size 0x10, init_key_dist 0x0D, resp_key_dist 0x0F
<dbg> bt_smp: smp_init: prnd 849156b688e37e12ad063cb072baf74f
<dbg> bt_smp: smp_pairing_req: rsp: io_capability 0x02, oob_flag 0x00, auth_req 0x0D, max_key_size 0x10, init_key_dist 0x00, resp_key_dist 0x00
<dbg> bt_smp: bt_smp_recv: Received SMP code 0x0c len 64
<dbg> bt_smp: smp_public_key: 
<inf> Bt: Received passkey pairing inquiry.
<inf> Bt: Type `uhk passkey xxxxxx` to pair, or `uhk passkey -1` to reject
<wrn> bt_conn: conn 0x20010498: not connected
<dbg> bt_smp: bt_smp_disconnected: chan 0x20010b9c cid 0x0006
<dbg> bt_smp: smp_pairing_complete: got status 0x8
<dbg> bt_smp: bt_smp_encrypt_change: chan 0x20010b9c conn 0x20010498 handle 1 encrypt 0x00 hci status 0x1f 
<wrn> Bt: Bt security failed: n/a (n/a, 98:5f:41:d2:92:3a), level 1, err 9, disconnecting
<wrn> Bt: The connection (n/a (n/a, 98:5f:41:d2:92:3a)) isn't even connected! Ignoring.
<wrn> Bt: Pairing of auth conn failed because of 9
<wrn> Bt: Pairing failed: n/a (n/a, 98:5f:41:d2:92:3a), reason 9


AI back and forths suggest that the host PC cancels pairing because it receives the zero distribution flags in
```
<dbg> bt_smp: smp_pairing_req: rsp: io_capability 0x02, oob_flag 0x00, auth_req 0x0D, max_key_size 0x10, init_key_dist 0x00, resp_key_dist 0x00
```

Detailed logs (attaching file here doesn't work, so linking externally):
- zephyr with hci_core and smp logging: http://ktweb.cz/upload/logs2/right.log
- the same, but filtered: http://ktweb.cz/upload/logs2/right_filtered.log
- btmon log: http://ktweb.cz/upload/logs2/btmon.log

Any ideas what the problem might be and where to look further?

EDIT: here is wireshark log (paired from gui this time):

  - fails on gatt instead  http://ktweb.cz/upload/logs2/wireshark_linux_pairing.pcapng (This is probably our problem, since gatt is handled by an external service.)
- this one actually contains pairing attempt similar to the key distribution problem: http://ktweb.cz/upload/logs2/wireshark_linux_pairing2.pcapng

EDIT2 my updated conclusions and hypotheses:
- I think that the distribution flags are correct and (on of the) troubles is that zephyr doesn't reply with its own keys within the next 400ms or so.
- Looks like key generation takes too long on zephyr side. Before it is finished, the connection gets terminated.
- Reason for termination seems to be related to gatt and the ll buffers. Right before the connection is determined as disconnected, I see this check to fail in conn.c. Increasing the number doesn't fix the issue, but results in increased number of calls to bt_conn_tx_processor before failure (which is where the disconnect is determined):

static bool should_stop_tx(struct bt_conn *conn)
    ...
	if (atomic_get(&conn->in_ll) < 3) {
	   ...
	   return false;
	}
    return true;
}
	   

- my guess is that the bt_conn_tx_processor is triggered by receiving the gatt requests
- successful pairing against android http://ktweb.cz/upload/logs2/wireshark_android_pairing.pcapng
- there is a minor difference between linux and android flow. I think it is not important, but it should be noted that we are trying to achieve security level 4:

Initiator Key Distribution: 0x0d, Link Key, Signature Key (CSRK), Encryption Key (LTK)
    0000 .... = Reserved: 0x0
    .... 1... = Link Key: True
    .... .1.. = Signature Key (CSRK): True
    .... ..0. = Id Key (IRK): False         // This is false from linux, but true from android
    .... ...1 = Encryption Key (LTK): True

- failed pairing with the number from above excerpt increased 3 to 12 and buffer count: http://ktweb.cz/upload/logs2/wireshark_linux_pairing3.pcapng (flood of gatt packets from central to peripheral)
- This makes me think that either linux simply overwhelms zephyr with gatt requests, or there is some problem with our gatt provider (which is implemented by a custom implementation from c2usb). Still will be grateful for any thoughts on the subject.

Parents
  • Well:

    - (Earlier, I tried increasing various buffer counts, including the acl tx, without any success.)
    - I have traced the disconnect to an exhaustion of acl buffers.
    - After seeing a 69 byte message getting chunked into 3 packets / messages, I have realized that with default LE payload length of 27 bytes, running out of buffers isn't all that surprising.
    - I have tried to enable the data length extension right after connection callback kicks in.
    - This did the trick.

    I am still not sure what a proper solution should be or what part of the stack and of development process is actually to be blamed.

    But I wonder, wouldn't it be nice if zephyr logged a warn (or maybe an error!) in situations like these - when it decides to drop a connection because of an unexpected failure?

Reply
  • Well:

    - (Earlier, I tried increasing various buffer counts, including the acl tx, without any success.)
    - I have traced the disconnect to an exhaustion of acl buffers.
    - After seeing a 69 byte message getting chunked into 3 packets / messages, I have realized that with default LE payload length of 27 bytes, running out of buffers isn't all that surprising.
    - I have tried to enable the data length extension right after connection callback kicks in.
    - This did the trick.

    I am still not sure what a proper solution should be or what part of the stack and of development process is actually to be blamed.

    But I wonder, wouldn't it be nice if zephyr logged a warn (or maybe an error!) in situations like these - when it decides to drop a connection because of an unexpected failure?

Children
No Data
Related