CONFIG_BT_EXT_ADV and scan timeout - ncs 2.9.0 breaking change

m5k8 5 months ago

I'm migrating code from ncs 2.7 to 2.9.

I am using CONFIG_BT_EXT_ADV and scan timeout callback - such timeout requires CONFIG_BT_EXT_ADV: RE: nRF Connect SDK Scan Timeout

It was working fine. Now it's no more. Scan start returns error, and this error boils down to the following code in function valid_le_scan_param() located in ncs/zephyr/subsys/bluetooth/host/scan.c

	if (IS_ENABLED(CONFIG_BT_PRIVACY) &&
	    param->type == BT_LE_SCAN_TYPE_ACTIVE &&
	    param->timeout != 0) {
		/* This is marked as not supported as a stopgap until the (scan,
		 * adv, init) roles are reworked into proper state machines.
		 *
		 * Having proper state machines is necessary to be able to
		 * suspend all roles that use the (resolvable) private address,
		 * update the RPA and resume them again with the right
		 * parameters.
		 *
		 * Else we lower the privacy of the device as either the RPA
		 * update will fail or the scanner will not use the newly
		 * generated RPA.
		 */
		return false;

As can be seen, timeout != 0 ends in error.

It's new condition, added somewhere between ncs 2.7 and 2.9 and I would call it regression and breaking change.

So, what's the expected path forward?

Top Replies

Einar Thorsrud 5 months ago +1 verified

Hi, This change was introduced in this pull request, fixing this issue. The change is not a regression, and the API doc was updated in the same commit. However, it is a breaking API change that removes…

Parents

+1 Einar Thorsrud 5 months ago

Hi,

This change was introduced in this pull request, fixing this issue. The change is not a regression, and the API doc was updated in the same commit. However, it is a breaking API change that removes a feature. I suggest implementign scan timeout your self in this case, using a k_timer or delayed work queue item or similar, where you call bt_le_scan_stop().
Cancel
Vote Up +1 Vote Down

Sign in to reply

Reject Answer

Cancel
0 m5k8 5 months ago in reply to Einar Thorsrud
Einar Thorsrud said:
I suggest implementign scan timeout your self in this case, using a k_timer

I tried this. Calling bt_le_scan_stop() results in kernel oops:

[00:00:29.872,009] <inf> ble: stopping scan ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:430 Controller unresponsive, command opcode 0x200c timeout with err -11 [00:00:29.872,131] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000006 [00:00:29.872,161] <err> os: r3/a4: 0x00000003 r12/ip: 0x00000010 r14/lr: 0x0002579b [00:00:29.872,161] <err> os: xpsr: 0x01000021 [00:00:29.872,192] <err> os: Faulting instruction address (r15/pc): 0x000257aa [00:00:29.872,222] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0 [00:00:29.872,222] <err> os: Fault during interrupt handling

Timeout is mentioned, but no time passes between stopping scan (first message line, from my app) and failure.

But I can see there's also message about "Fault during interrupt handling" - so I tried to offload restarting scan into separate helper thread, that is doing scan restart, this thread is directed via semaphore and k_timer callback only signals this semaphore.

That works.

So it looks like timer expiry handler is called from interrupt context is not up to the job of doing anything advanced.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 m5k8 5 months ago in reply to Einar Thorsrud
Einar Thorsrud said:
I suggest implementign scan timeout your self in this case, using a k_timer

I tried this. Calling bt_le_scan_stop() results in kernel oops:

[00:00:29.872,009] <inf> ble: stopping scan ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:430 Controller unresponsive, command opcode 0x200c timeout with err -11 [00:00:29.872,131] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000006 [00:00:29.872,161] <err> os: r3/a4: 0x00000003 r12/ip: 0x00000010 r14/lr: 0x0002579b [00:00:29.872,161] <err> os: xpsr: 0x01000021 [00:00:29.872,192] <err> os: Faulting instruction address (r15/pc): 0x000257aa [00:00:29.872,222] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0 [00:00:29.872,222] <err> os: Fault during interrupt handling

Timeout is mentioned, but no time passes between stopping scan (first message line, from my app) and failure.

But I can see there's also message about "Fault during interrupt handling" - so I tried to offload restarting scan into separate helper thread, that is doing scan restart, this thread is directed via semaphore and k_timer callback only signals this semaphore.

That works.

So it looks like timer expiry handler is called from interrupt context is not up to the job of doing anything advanced.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Einar Thorsrud 5 months ago in reply to m5k8

Hi,

I was too quick writing my reply. The callback from the k_timer has too high priority, so anothe rmethod needs to be used (a work queue, or you could message a thread handling it, for instance).

Edit: a simple approach could be to use a k_timer to post in a work queue as shown here.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 m5k8 5 months ago in reply to Einar Thorsrud

I got it working, kinda, somehow.

Previously the path was clear: controller returned either successful connection or timeout.

Now one thread tries to connect, the other one (timer + something) tries to determine when to stop waiting.

I can get both: connection and timeout in random order, depending on how threads will interleave. Looks like race condition. I got it working, but code is messy and difficult to test.

Commit says: "This is marked as not supported as a stopgap until the (scan, adv, init) roles are reworked into proper state machines."

So waiting for this pending rework.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel