Matter: mDNS error when pairing immediately after unpair

I'm using

  • the nRF52840 DK and the Matter light_bulb example, SDK v2.7.0.
  • a raspberry pi with the nRF dongle as a thread border router
  • the chip-tool running and compiled for macos (but with the repository version from the SDK)

When trying to pair directly after the device has been unpaired, meaning

  • chip-tool pairing ble-thread <node-id> hex:<dataset active -x> 20202021 3840
  • chip-tool pairing unpair <node-id>

then the pairing process takes significantly longer due to some reported mDNS error and a lookup for some address that doesn't seem to belong to my device (there are no other nodes on the fabric). The "offending" log in the chip-tool is the following (comparing compared to an OK pairing sequence):

[1731047068526] [4486:2075296] [DIS] Checking node lookup status after 202 ms
[1731047071245] [4486:2075296] [DIS] Mdns: OnResolve flags: 0, interface: 11, fullname: BC457DD5B6258255-0000000000000045._matter._tcp.local., hostname: 1E6BC32C13681711.local., port: 5540
[1731047071245] [4486:2075296] [DIS] Mdns : OnNewInterface hostname:1E6BC32C13681711.local. fullname:BC457DD5B6258255-0000000000000045._matter._tcp.local. interface: 11 port: 5540 TXT:"SII=2000,SAI=2000,SAT=4000"
[1731047073328] [4486:2075295] [DIS] Mdns: OnGetAddrInfo flags: 3, interface: 0, hostname: 0E38BB10063D903E.local.
[1731047073328] [4486:2075295] [DIS] OnGetAddrInfo (kDNSServiceErr_Timeout)
[1731047073328] [4486:2075295] [DIS] Mdns: OnGetAddrInfo flags: 2, interface: 0, hostname: 0E38BB10063D903E.local.
[1731047073328] [4486:2075295] [DIS] OnGetAddrInfo (kDNSServiceErr_Timeout)
[1731047073328] [4486:2075295] [DIS] Mdns: Resolve failure (kDNSServiceErr_BadState)
[1731047113327] [4486:2075296] [DIS] Checking node lookup status after 45002 ms
[1731047113327] [4486:2075296] [DIS] No longer need resolve for BC457DD5B6258255-0000000000000045
[1731047113327] [4486:2075296] [DIS] OperationalSessionSetup[1:0000000000000045]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
[1731047113327] [4486:2075296] [DIS] Retrying operational DNS-SD discovery. Attempts remaining: 2
[1731047113327] [4486:2075296] [DIS] Resolving BC457DD5B6258255:0000000000000045 ...
[1731047113327] [4486:2075296] [DIS] Resolve type=_matter._tcp name=BC457DD5B6258255-0000000000000045 interface=0
[1731047113328] [4486:2075296] [CTL] Session establishment failed for <0000000000000045, 1>, error: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout.  Next retry expected to get a response to Sigma1 or fail within 60 seconds
[1731047113328] [4486:2075296] [CTL] Arming failsafe (120 seconds)
[1731047113328] [4486:2075296] [DMG] ICR moving to [AddingComm]
[1731047113328] [4486:2075296] [DMG] ICR moving to [AddedComma]
[1731047113329] [4486:2075296] [EM] <<< [E:10437i S:63980 M:221767263] (S) Msg TX to 0:FFFFFFFB00000000 [0000] [BLE] --- Type 0001:08 (IM:InvokeCommandRequest)
[1731047113329] [4486:2075296] [DMG] ICR moving to [AwaitingRe]
[1731047113329] [4486:2075296] [DIS] Mdns: OnResolve flags: 0, interface: 11, fullname: BC457DD5B6258255-0000000000000045._matter._tcp.local., hostname: 1E6BC32C13681711.local., port: 5540
[1731047113329] [4486:2075296] [DIS] Mdns : OnNewInterface hostname:1E6BC32C13681711.local. fullname:BC457DD5B6258255-0000000000000045._matter._tcp.local. interface: 11 port: 5540 TXT:"SII=2000,SAI=2000,SAT=4000"
[1731047113331] [4486:2075296] [DIS] Mdns: OnGetAddrInfo flags: 1073741826, interface: 11, hostname: 1E6BC32C13681711.local.
[1731047113331] [4486:2075296] [DIS] Mdns: OnNewAddress interface: 11 ip:fd21:ad93:14a0:1:d117:fae1:8f33:f012
[1731047113331] [4486:2075296] [DIS] Mdns: Resolve success on interface 11
[1731047113331] [4486:2075296] [DIS] Node ID resolved for BC457DD5B6258255:0000000000000045
[1731047113331] [4486:2075296] [DIS] 	Hostname: 1E6BC32C13681711
[1731047113331] [4486:2075296] [DIS] 	IP Address #1: fd21:ad93:14a0:1:d117:fae1:8f33:f012
[1731047113331] [4486:2075296] [DIS] 	Port: 5540
[1731047113331] [4486:2075296] [DIS] 	Mrp Interval idle: 2000 ms
[1731047113332] [4486:2075296] [DIS] 	Mrp Interval active: 2000 ms
[1731047113332] [4486:2075296] [DIS] 	Mrp Active Threshold: 4000 ms
[1731047113332] [4486:2075296] [DIS] 	TCP Supported: 0
[1731047113332] [4486:2075296] [DIS] 	ICD: not present
[1731047113332] [4486:2075296] [DIS] Lookup clearing interface for non LL address
[1731047113332] [4486:2075296] [DIS] UDP:[fd21:ad93:14a0:1:d117:fae1:8f33:f012%en0]:5540: new best score: 3
[1731047113332] [4486:2075296] [DIS] Checking node lookup status after 5 ms
[1731047113332] [4486:2075296] [DIS] Keeping DNSSD lookup active
[1731047113503] [4486:2075295] [EM] >>> [E:10437i S:63980 M:22159] (S) Msg RX from 0:FFFFFFFB00000000 [0000] --- Type 0001:09 (IM:InvokeCommandResponse)
[1731047113503] [4486:2075295] [EM] Found matching exchange: 10437i, Delegate: 0x14b504258
[1731047113503] [4486:2075295] [DMG] ICR moving to [ResponseRe]
[1731047113503] [4486:2075295] [DMG] InvokeResponseMessage =
[1731047113503] [4486:2075295] [DMG] {
[1731047113503] [4486:2075295] [DMG] 	suppressResponse = false,
[1731047113503] [4486:2075295] [DMG] 	InvokeResponseIBs =
[1731047113503] [4486:2075295] [DMG] 	[
[1731047113503] [4486:2075295] [DMG] 		InvokeResponseIB =
[1731047113503] [4486:2075295] [DMG] 		{
[1731047113503] [4486:2075295] [DMG] 			CommandDataIB =
[1731047113503] [4486:2075295] [DMG] 			{
[1731047113503] [4486:2075295] [DMG] 				CommandPathIB =
[1731047113503] [4486:2075295] [DMG] 				{
[1731047113503] [4486:2075295] [DMG] 					EndpointId = 0x0,
[1731047113503] [4486:2075295] [DMG] 					ClusterId = 0x30,
[1731047113503] [4486:2075295] [DMG] 					CommandId = 0x1,
[1731047113503] [4486:2075295] [DMG] 				},
[1731047113503] [4486:2075295] [DMG]
[1731047113503] [4486:2075295] [DMG] 				CommandFields =
[1731047113503] [4486:2075295] [DMG] 				{
[1731047113504] [4486:2075295] [DMG] 					0x0 = 0 (unsigned),
[1731047113504] [4486:2075295] [DMG] 					0x1 = "" (0 chars),
[1731047113504] [4486:2075295] [DMG] 				},
[1731047113504] [4486:2075295] [DMG] 			},
[1731047113504] [4486:2075295] [DMG]
[1731047113504] [4486:2075295] [DMG] 		},
[1731047113504] [4486:2075295] [DMG]
[1731047113504] [4486:2075295] [DMG] 	],
[1731047113504] [4486:2075295] [DMG]
[1731047113504] [4486:2075295] [DMG] 	InteractionModelRevision = 11
[1731047113504] [4486:2075295] [DMG] },
[1731047113504] [4486:2075295] [DMG] Received Command Response Data, Endpoint=0 Cluster=0x0000_0030 Command=0x0000_0001
[1731047113504] [4486:2075295] [CTL] Status of extending fail-safe for CASE retry: 0

The important parts seem to be

  • OnGetAddrInfo (kDNSServiceErr_Timeout)
  • Mdns: Resolve failure (kDNSServiceErr_BadState)
  • OperationalSessionSetup[1:0000000000000045]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
  • Session establishment failed for <0000000000000045, 1>, error: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout.  Next retry expected to get a response to Sigma1 or fail within 60 seconds

I'm not an expert on how mDNS and the translation to DNS-SD works, so I'm really struggling to understand what goes wrong here. My best guess is that maybe a "goodbye" message went missing and therefore the mDNS cache at the border router keeps a now invalid entry. This would explain why the pairing works without problems if you wait for ~2 minutes after unpairing.

Even with the above error, the pairing itself is successful 90% of the time - it just takes significantly longer. Arbitrarily I observed that the sequence failed but (after a reboot of the nRF52840 DK) I could pair successfully.

Is there anything I'm missing? I only found the following tickets on the connectedhomeip project, but they didn't really seem to match my problem

I'm also attaching a screenshot where the left side is the log of an OK pairing sequence and the right side is a failed sequence.

Parents Reply
  • Just a status update from me. Apologies for the long response time. I'm waiting for some input from the relevant teams. I've reached out to them again to get an ETA for a reply.

    For now, what I understand is that you're onto the reason for why it is as you see, i.e there are some artifacts that needs to be removed before the re-pairing process does not use too much time. A disclaimer, I'm not 100% sure, so I'll update this reply when I hear back from the team.

    Kind regards,
    Andreas

Children
Related