CONFIG_BT_EXT_ADV and scan timeout - ncs 2.9.0 breaking change

I'm migrating code from ncs 2.7 to 2.9.

I am using CONFIG_BT_EXT_ADV and scan timeout callback - such timeout requires CONFIG_BT_EXT_ADV:  RE: nRF Connect SDK Scan Timeout

It was working fine. Now it's no more. Scan start returns error, and this error boils down to the following code in function valid_le_scan_param() located in ncs/zephyr/subsys/bluetooth/host/scan.c

	if (IS_ENABLED(CONFIG_BT_PRIVACY) &&
	    param->type == BT_LE_SCAN_TYPE_ACTIVE &&
	    param->timeout != 0) {
		/* This is marked as not supported as a stopgap until the (scan,
		 * adv, init) roles are reworked into proper state machines.
		 *
		 * Having proper state machines is necessary to be able to
		 * suspend all roles that use the (resolvable) private address,
		 * update the RPA and resume them again with the right
		 * parameters.
		 *
		 * Else we lower the privacy of the device as either the RPA
		 * update will fail or the scanner will not use the newly
		 * generated RPA.
		 */
		return false;

As can be seen, timeout != 0 ends in error.

It's new condition, added somewhere between ncs 2.7 and 2.9 and I would call it regression and breaking change.

So, what's the expected path forward?

Parents
  • Hi,

    This change was introduced in this pull request, fixing this issue. The change is not a regression, and the API doc was updated in the same commit. However, it is a breaking API change that removes a feature. I suggest implementign scan timeout your self in this case, using a k_timer or delayed work queue item or similar, where you call bt_le_scan_stop().

  • I suggest implementign scan timeout your self in this case, using a k_timer

    I tried this. Calling bt_le_scan_stop() results in kernel oops:

    [00:00:29.872,009] <inf> ble: stopping scan
    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:430
    	Controller unresponsive, command opcode 0x200c timeout with err -11
    [00:00:29.872,131] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000006
    [00:00:29.872,161] <err> os: r3/a4:  0x00000003 r12/ip:  0x00000010 r14/lr:  0x0002579b
    [00:00:29.872,161] <err> os:  xpsr:  0x01000021
    [00:00:29.872,192] <err> os: Faulting instruction address (r15/pc): 0x000257aa
    [00:00:29.872,222] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    [00:00:29.872,222] <err> os: Fault during interrupt handling

    Timeout is mentioned, but no time passes between stopping scan (first message line, from my app) and failure.

    But I can see there's also message about "Fault during interrupt handling" - so I tried to offload restarting scan into separate helper thread, that is doing scan restart, this thread is directed via semaphore and k_timer callback only signals this semaphore.

    That works.

    So it looks like timer expiry handler is called from interrupt context is not up to the job of doing anything advanced.

Reply
  • I suggest implementign scan timeout your self in this case, using a k_timer

    I tried this. Calling bt_le_scan_stop() results in kernel oops:

    [00:00:29.872,009] <inf> ble: stopping scan
    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:430
    	Controller unresponsive, command opcode 0x200c timeout with err -11
    [00:00:29.872,131] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000006
    [00:00:29.872,161] <err> os: r3/a4:  0x00000003 r12/ip:  0x00000010 r14/lr:  0x0002579b
    [00:00:29.872,161] <err> os:  xpsr:  0x01000021
    [00:00:29.872,192] <err> os: Faulting instruction address (r15/pc): 0x000257aa
    [00:00:29.872,222] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    [00:00:29.872,222] <err> os: Fault during interrupt handling

    Timeout is mentioned, but no time passes between stopping scan (first message line, from my app) and failure.

    But I can see there's also message about "Fault during interrupt handling" - so I tried to offload restarting scan into separate helper thread, that is doing scan restart, this thread is directed via semaphore and k_timer callback only signals this semaphore.

    That works.

    So it looks like timer expiry handler is called from interrupt context is not up to the job of doing anything advanced.

Children
  • Hi,

    I was too quick writing my reply. The callback from the k_timer has too high priority, so anothe rmethod needs to be used (a work queue, or you could message a thread handling it, for instance).

    Edit: a simple approach could be to use a k_timer to post in a work queue as shown here.

  • I got it working, kinda, somehow.

    Previously the path was clear: controller returned either successful connection or timeout.

    Now one thread tries to connect, the other one (timer + something) tries to determine when to stop waiting.

    I can get both: connection and timeout in random order, depending on how threads will interleave. Looks like race condition. I got it working, but code is messy and difficult to test.

    Commit says: "This is marked as not supported as a stopgap until the (scan, adv, init) roles are reworked into proper state machines."

    So waiting for this pending rework.

Related