Matter: mDNS error when pairing immediately after unpair

I'm using

  • the nRF52840 DK and the Matter light_bulb example, SDK v2.7.0.
  • a raspberry pi with the nRF dongle as a thread border router
  • the chip-tool running and compiled for macos (but with the repository version from the SDK)

When trying to pair directly after the device has been unpaired, meaning

  • chip-tool pairing ble-thread <node-id> hex:<dataset active -x> 20202021 3840
  • chip-tool pairing unpair <node-id>

then the pairing process takes significantly longer due to some reported mDNS error and a lookup for some address that doesn't seem to belong to my device (there are no other nodes on the fabric). The "offending" log in the chip-tool is the following (comparing compared to an OK pairing sequence):

[1731047068526] [4486:2075296] [DIS] Checking node lookup status after 202 ms
[1731047071245] [4486:2075296] [DIS] Mdns: OnResolve flags: 0, interface: 11, fullname: BC457DD5B6258255-0000000000000045._matter._tcp.local., hostname: 1E6BC32C13681711.local., port: 5540
[1731047071245] [4486:2075296] [DIS] Mdns : OnNewInterface hostname:1E6BC32C13681711.local. fullname:BC457DD5B6258255-0000000000000045._matter._tcp.local. interface: 11 port: 5540 TXT:"SII=2000,SAI=2000,SAT=4000"
[1731047073328] [4486:2075295] [DIS] Mdns: OnGetAddrInfo flags: 3, interface: 0, hostname: 0E38BB10063D903E.local.
[1731047073328] [4486:2075295] [DIS] OnGetAddrInfo (kDNSServiceErr_Timeout)
[1731047073328] [4486:2075295] [DIS] Mdns: OnGetAddrInfo flags: 2, interface: 0, hostname: 0E38BB10063D903E.local.
[1731047073328] [4486:2075295] [DIS] OnGetAddrInfo (kDNSServiceErr_Timeout)
[1731047073328] [4486:2075295] [DIS] Mdns: Resolve failure (kDNSServiceErr_BadState)
[1731047113327] [4486:2075296] [DIS] Checking node lookup status after 45002 ms
[1731047113327] [4486:2075296] [DIS] No longer need resolve for BC457DD5B6258255-0000000000000045
[1731047113327] [4486:2075296] [DIS] OperationalSessionSetup[1:0000000000000045]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
[1731047113327] [4486:2075296] [DIS] Retrying operational DNS-SD discovery. Attempts remaining: 2
[1731047113327] [4486:2075296] [DIS] Resolving BC457DD5B6258255:0000000000000045 ...
[1731047113327] [4486:2075296] [DIS] Resolve type=_matter._tcp name=BC457DD5B6258255-0000000000000045 interface=0
[1731047113328] [4486:2075296] [CTL] Session establishment failed for <0000000000000045, 1>, error: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout.  Next retry expected to get a response to Sigma1 or fail within 60 seconds
[1731047113328] [4486:2075296] [CTL] Arming failsafe (120 seconds)
[1731047113328] [4486:2075296] [DMG] ICR moving to [AddingComm]
[1731047113328] [4486:2075296] [DMG] ICR moving to [AddedComma]
[1731047113329] [4486:2075296] [EM] <<< [E:10437i S:63980 M:221767263] (S) Msg TX to 0:FFFFFFFB00000000 [0000] [BLE] --- Type 0001:08 (IM:InvokeCommandRequest)
[1731047113329] [4486:2075296] [DMG] ICR moving to [AwaitingRe]
[1731047113329] [4486:2075296] [DIS] Mdns: OnResolve flags: 0, interface: 11, fullname: BC457DD5B6258255-0000000000000045._matter._tcp.local., hostname: 1E6BC32C13681711.local., port: 5540
[1731047113329] [4486:2075296] [DIS] Mdns : OnNewInterface hostname:1E6BC32C13681711.local. fullname:BC457DD5B6258255-0000000000000045._matter._tcp.local. interface: 11 port: 5540 TXT:"SII=2000,SAI=2000,SAT=4000"
[1731047113331] [4486:2075296] [DIS] Mdns: OnGetAddrInfo flags: 1073741826, interface: 11, hostname: 1E6BC32C13681711.local.
[1731047113331] [4486:2075296] [DIS] Mdns: OnNewAddress interface: 11 ip:fd21:ad93:14a0:1:d117:fae1:8f33:f012
[1731047113331] [4486:2075296] [DIS] Mdns: Resolve success on interface 11
[1731047113331] [4486:2075296] [DIS] Node ID resolved for BC457DD5B6258255:0000000000000045
[1731047113331] [4486:2075296] [DIS] 	Hostname: 1E6BC32C13681711
[1731047113331] [4486:2075296] [DIS] 	IP Address #1: fd21:ad93:14a0:1:d117:fae1:8f33:f012
[1731047113331] [4486:2075296] [DIS] 	Port: 5540
[1731047113331] [4486:2075296] [DIS] 	Mrp Interval idle: 2000 ms
[1731047113332] [4486:2075296] [DIS] 	Mrp Interval active: 2000 ms
[1731047113332] [4486:2075296] [DIS] 	Mrp Active Threshold: 4000 ms
[1731047113332] [4486:2075296] [DIS] 	TCP Supported: 0
[1731047113332] [4486:2075296] [DIS] 	ICD: not present
[1731047113332] [4486:2075296] [DIS] Lookup clearing interface for non LL address
[1731047113332] [4486:2075296] [DIS] UDP:[fd21:ad93:14a0:1:d117:fae1:8f33:f012%en0]:5540: new best score: 3
[1731047113332] [4486:2075296] [DIS] Checking node lookup status after 5 ms
[1731047113332] [4486:2075296] [DIS] Keeping DNSSD lookup active
[1731047113503] [4486:2075295] [EM] >>> [E:10437i S:63980 M:22159] (S) Msg RX from 0:FFFFFFFB00000000 [0000] --- Type 0001:09 (IM:InvokeCommandResponse)
[1731047113503] [4486:2075295] [EM] Found matching exchange: 10437i, Delegate: 0x14b504258
[1731047113503] [4486:2075295] [DMG] ICR moving to [ResponseRe]
[1731047113503] [4486:2075295] [DMG] InvokeResponseMessage =
[1731047113503] [4486:2075295] [DMG] {
[1731047113503] [4486:2075295] [DMG] 	suppressResponse = false,
[1731047113503] [4486:2075295] [DMG] 	InvokeResponseIBs =
[1731047113503] [4486:2075295] [DMG] 	[
[1731047113503] [4486:2075295] [DMG] 		InvokeResponseIB =
[1731047113503] [4486:2075295] [DMG] 		{
[1731047113503] [4486:2075295] [DMG] 			CommandDataIB =
[1731047113503] [4486:2075295] [DMG] 			{
[1731047113503] [4486:2075295] [DMG] 				CommandPathIB =
[1731047113503] [4486:2075295] [DMG] 				{
[1731047113503] [4486:2075295] [DMG] 					EndpointId = 0x0,
[1731047113503] [4486:2075295] [DMG] 					ClusterId = 0x30,
[1731047113503] [4486:2075295] [DMG] 					CommandId = 0x1,
[1731047113503] [4486:2075295] [DMG] 				},
[1731047113503] [4486:2075295] [DMG]
[1731047113503] [4486:2075295] [DMG] 				CommandFields =
[1731047113503] [4486:2075295] [DMG] 				{
[1731047113504] [4486:2075295] [DMG] 					0x0 = 0 (unsigned),
[1731047113504] [4486:2075295] [DMG] 					0x1 = "" (0 chars),
[1731047113504] [4486:2075295] [DMG] 				},
[1731047113504] [4486:2075295] [DMG] 			},
[1731047113504] [4486:2075295] [DMG]
[1731047113504] [4486:2075295] [DMG] 		},
[1731047113504] [4486:2075295] [DMG]
[1731047113504] [4486:2075295] [DMG] 	],
[1731047113504] [4486:2075295] [DMG]
[1731047113504] [4486:2075295] [DMG] 	InteractionModelRevision = 11
[1731047113504] [4486:2075295] [DMG] },
[1731047113504] [4486:2075295] [DMG] Received Command Response Data, Endpoint=0 Cluster=0x0000_0030 Command=0x0000_0001
[1731047113504] [4486:2075295] [CTL] Status of extending fail-safe for CASE retry: 0

The important parts seem to be

  • OnGetAddrInfo (kDNSServiceErr_Timeout)
  • Mdns: Resolve failure (kDNSServiceErr_BadState)
  • OperationalSessionSetup[1:0000000000000045]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout
  • Session establishment failed for <0000000000000045, 1>, error: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:119: CHIP Error 0x00000032: Timeout.  Next retry expected to get a response to Sigma1 or fail within 60 seconds

I'm not an expert on how mDNS and the translation to DNS-SD works, so I'm really struggling to understand what goes wrong here. My best guess is that maybe a "goodbye" message went missing and therefore the mDNS cache at the border router keeps a now invalid entry. This would explain why the pairing works without problems if you wait for ~2 minutes after unpairing.

Even with the above error, the pairing itself is successful 90% of the time - it just takes significantly longer. Arbitrarily I observed that the sequence failed but (after a reboot of the nRF52840 DK) I could pair successfully.

Is there anything I'm missing? I only found the following tickets on the connectedhomeip project, but they didn't really seem to match my problem

I'm also attaching a screenshot where the left side is the log of an OK pairing sequence and the right side is a failed sequence.

Parents Reply Children
  • Hi,

    I'm running the `chip-tool` on macos so I'm pretty sure it's using Bonjour (e.g., I couldn't run the `chip-ota-provider-app` within Docker on macos host since I'd have to pass avahi into the container, which I didn't manage).

    Yeah so it's not too far off, might simply be a client-side issue. I'm using `ClearAllSrpHostAndServices` in my fabric removal - and it's also part of the default fabric delegate used by the light bulb example. Since both applications fail I was quite confident I'm not doing anything wrong.

    Maybe I'll try to run the `chip-tool` on the OTBR and see if it behaves differently, but currently I'm in the middle of testing so I'd like to keep the setup as it is for now.

    Thank you very much for the feedback!

    Cheers.M

  • Noted,

    I'll relay that to the discussion and say that it will be idle for now w.r.t waiting for a more suiting time to check if it behaves similar when running chip-tool on the OTBR on your side.

    For now, feel free to let me know when you have an update or if you have additional questions to the topic in this discussion, and I'll look into them.

    Kind regards,
    Andreas

Related