Usage Fault following bt_le_per_adv_sync_delete

Question

Hello.

I'm looking for some advice debugging what I think is a race condition around bt_le_per_adv_sync_delete.

After calling bt_le_per_adv_sync_delete, we're sometimes seeing an Illegal load of EXC_RETURN into PC fault.

[00:05:18.404,205] <err> os: ***** USAGE FAULT *****
[00:05:18.404,205] <err> os:   Illegal load of EXC_RETURN into PC
[00:05:18.404,235] <err> os: r0/a1:  0x00000000  r1/a2:  0x0006ca19  r2/a3:  0x00000091
[00:05:18.404,235] <err> os: r3/a4:  0x00000003 r12/ip:  0x00000002 r14/lr:  0x2000764c
[00:05:18.404,266] <err> os:  xpsr:  0x0003ee00
[00:05:18.404,266] <err> os: s[ 0]:  0x2000fbc8  s[ 1]:  0x2000fbc8  s[ 2]:  0x20018068  s[ 3]:  0x000a39e9
[00:05:18.404,296] <err> os: s[ 4]:  0x000a39d7  s[ 5]:  0x0008be7b  s[ 6]:  0xffffffff  s[ 7]:  0xffffffff
[00:05:18.404,296] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x00000000  s[10]:  0x0008bde9  s[11]:  0x00000000
[00:05:18.404,327] <err> os: s[12]:  0x00000000  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x0009b273
[00:05:18.404,327] <err> os: fpscr:  0x00000000
[00:05:18.404,357] <err> os: r4/v1:  0x00000000  r5/v2:  0x00000000  r6/v3:  0xffffffff
[00:05:18.404,357] <err> os: r7/v4:  0xffffffff  r8/v5:  0x00000000  r9/v6:  0x00000000
[00:05:18.404,388] <err> os: r10/v7: 0x00000000  r11/v8: 0x00000000    psp:  0x20022518
[00:05:18.404,388] <err> os: EXC_RETURN: 0xfffffffd
[00:05:18.404,388] <err> os: Faulting instruction address (r15/pc): 0x20019e24
[00:08:00.736,022] <err> os: >>> ZEPHYR FATAL ERROR 34: Unknown error on CPU 0
[00:08:00.736,053] <err> os: Current thread: 0x2000fbc8 (MPSL Work)
[00:08:01.752,288] <err> fatal_error: Resetting system

It's difficult to trigger, but when I have achieved it, traces are showing that it's happening somewhere around `pending_cancels` on the MPSL workqueue. Unfortunately I don't seem to have debug symbols to be able to look further into the fault.

Vidar Berg · Accepted Answer

I'm glad to hear that you seem to have found the problem. Unfortunately, I'm not sure I understand what you mean when you ask if a semaphore is being stored in the user_data or how that relates to the usage fault exception. In any case, I asked internally for input on this, and one of the developers commented this from the SDCs point of view: 
 Setting subevent data is kind of special for the controller, since it is timing dependent.When the Host set subevent response data, the controller must send the data on air before we give command complete. 
 Figure 3.13: Using a response slot in PAwR: 
 
 There is special handling to make sure we reject any other HCI command until this has occurred: 
 bluetooth: Fix assert when handling illegal response data cmd · nrfconnect/sdk-nrf@d257b1d

Usage Fault following bt_le_per_adv_sync_delete

Top Replies