CONFIG_BT_EXT_ADV and scan timeout - ncs 2.9.0 breaking change

I'm migrating code from ncs 2.7 to 2.9.

I am using CONFIG_BT_EXT_ADV and scan timeout callback - such timeout requires CONFIG_BT_EXT_ADV:  RE: nRF Connect SDK Scan Timeout

It was working fine. Now it's no more. Scan start returns error, and this error boils down to the following code in function valid_le_scan_param() located in ncs/zephyr/subsys/bluetooth/host/scan.c

	if (IS_ENABLED(CONFIG_BT_PRIVACY) &&
	    param->type == BT_LE_SCAN_TYPE_ACTIVE &&
	    param->timeout != 0) {
		/* This is marked as not supported as a stopgap until the (scan,
		 * adv, init) roles are reworked into proper state machines.
		 *
		 * Having proper state machines is necessary to be able to
		 * suspend all roles that use the (resolvable) private address,
		 * update the RPA and resume them again with the right
		 * parameters.
		 *
		 * Else we lower the privacy of the device as either the RPA
		 * update will fail or the scanner will not use the newly
		 * generated RPA.
		 */
		return false;

As can be seen, timeout != 0 ends in error.

It's new condition, added somewhere between ncs 2.7 and 2.9 and I would call it regression and breaking change.

So, what's the expected path forward?

Parents
  • Hi,

    This change was introduced in this pull request, fixing this issue. The change is not a regression, and the API doc was updated in the same commit. However, it is a breaking API change that removes a feature. I suggest implementign scan timeout your self in this case, using a k_timer or delayed work queue item or similar, where you call bt_le_scan_stop().

  • I suggest implementign scan timeout your self in this case, using a k_timer

    I tried this. Calling bt_le_scan_stop() results in kernel oops:

    [00:00:29.872,009] <inf> ble: stopping scan
    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:430
    	Controller unresponsive, command opcode 0x200c timeout with err -11
    [00:00:29.872,131] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000006
    [00:00:29.872,161] <err> os: r3/a4:  0x00000003 r12/ip:  0x00000010 r14/lr:  0x0002579b
    [00:00:29.872,161] <err> os:  xpsr:  0x01000021
    [00:00:29.872,192] <err> os: Faulting instruction address (r15/pc): 0x000257aa
    [00:00:29.872,222] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    [00:00:29.872,222] <err> os: Fault during interrupt handling

    Timeout is mentioned, but no time passes between stopping scan (first message line, from my app) and failure.

    But I can see there's also message about "Fault during interrupt handling" - so I tried to offload restarting scan into separate helper thread, that is doing scan restart, this thread is directed via semaphore and k_timer callback only signals this semaphore.

    That works.

    So it looks like timer expiry handler is called from interrupt context is not up to the job of doing anything advanced.

  • Hi,

    I was too quick writing my reply. The callback from the k_timer has too high priority, so anothe rmethod needs to be used (a work queue, or you could message a thread handling it, for instance).

    Edit: a simple approach could be to use a k_timer to post in a work queue as shown here.

  • I got it working, kinda, somehow.

    Previously the path was clear: controller returned either successful connection or timeout.

    Now one thread tries to connect, the other one (timer + something) tries to determine when to stop waiting.

    I can get both: connection and timeout in random order, depending on how threads will interleave. Looks like race condition. I got it working, but code is messy and difficult to test.

    Commit says: "This is marked as not supported as a stopgap until the (scan, adv, init) roles are reworked into proper state machines."

    So waiting for this pending rework.

Reply
  • I got it working, kinda, somehow.

    Previously the path was clear: controller returned either successful connection or timeout.

    Now one thread tries to connect, the other one (timer + something) tries to determine when to stop waiting.

    I can get both: connection and timeout in random order, depending on how threads will interleave. Looks like race condition. I got it working, but code is messy and difficult to test.

    Commit says: "This is marked as not supported as a stopgap until the (scan, adv, init) roles are reworked into proper state machines."

    So waiting for this pending rework.

Children
No Data
Related