MPU FAULT during gatt service discovery

I'm trying to connect to 15 BLE devices. I've followed the code at: https://github.com/NordicMatt/multi-NUS/ but have modified it to filter by the device name and then write to a specific characteristic once it has discovered the services. I'm using a nRF528040-DK and the nRF Connect SDK v2.4.1

This works great up to about 12 devices. However when I try to connect to more than 12 devices I will intermittently get this error

Fullscreen
1
2
3
4
5
6
7
8
9
[00:00:13.189,636] <err> os: ***** MPU FAULT *****
[00:00:13.189,636] <err> os: Instruction Access Violation
[00:00:13.189,666] <err> os: r0/a1: 0x200027e8 r1/a2: 0x00000000 r2/a3: 0x2000d918
[00:00:13.189,697] <err> os: r3/a4: 0x20011efc r12/ip: 0x00000013 r14/lr: 0x0003043f
[00:00:13.189,697] <err> os: xpsr: 0x60000000
[00:00:13.189,727] <err> os: Faulting instruction address (r15/pc): 0x20011efc
[00:00:13.189,758] <err> os: >>> ZEPHYR FATAL ERROR 20: Unknown error on CPU 0
[00:00:13.189,788] <err> os: Current thread: 0x200021c0 (BT RX)
[00:00:13.320,098] <err> os: Halting system
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The error however doesn't occur after the 12th connection but right after I run this code:

Fullscreen
1
bt_gatt_dm_start
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

at a random connection (i.e. one time it occurs after the 15th connection, another time after the 3rd connection). However the error only occurs when I'm trying to connect to more than 12 devices (i.e. when I run bt_scan_filter_add 12 or more times)

but before my discovery_complete callback is called from logging at different points in my application. 

I saw some other issues like this and took steps to address the issues from them:

From RE: MPU FAULT occurs when discovering attributes of client I enabled CONFIG_THREAD_NAME=y and saw that the error occurs in the Current thread: 0x200021c0 (BT RX) I then increased added CONFIG_BT_RX_STACK_SIZE=4096 to my prj.conf

From  MPU Fault with NCS Zephyr on nrf52840s I tried looking at the thread stack usage but everything seems normal (this is with CONFIG_THREAD_ANALYZER_AUTO_INTERVAL=5)

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Thread analyze:
BT CTLR ECDH : STACK: unused 176 usage 728 / 904 (80 %); CPU: 0 %
: Total CPU cycles used: 849
BT RX : STACK: unused 2680 usage 1416 / 4096 (34 %); CPU: 0 %
: Total CPU cycles used: 733
BT TX : STACK: unused 760 usage 776 / 1536 (50 %); CPU: 0 %
: Total CPU cycles used: 519
thread_analyzer : STACK: unused 592 usage 432 / 1024 (42 %); CPU: 0 %
: Total CPU cycles used: 213
sysworkq : STACK: unused 3320 usage 776 / 4096 (18 %); CPU: 0 %
: Total CPU cycles used: 109
MPSL Work : STACK: unused 500 usage 524 / 1024 (51 %); CPU: 0 %
: Total CPU cycles used: 1761
BT_LW_WQ : STACK: unused 640 usage 664 / 1304 (50 %); CPU: 0 %
: Total CPU cycles used: 35
logging : STACK: unused 120 usage 648 / 768 (84 %); CPU: 8 %
: Total CPU cycles used: 27144
idle : STACK: unused 224 usage 96 / 320 (30 %); CPU: 90 %
: Total CPU cycles used: 299607
ISR0 : STACK: unused 936 usage 1112 / 2048 (54 %)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I'm new to the nRF5280 so any help with ways I can further debug this would be appreciated.

Here's my prj.conf as a reference:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Enable the BLE stack with GATT Client configuration
CONFIG_BT=y
CONFIG_BT_CENTRAL=y
CONFIG_BT_SMP=y
CONFIG_BT_GATT_CLIENT=y
CONFIG_BT_MAX_CONN=15
CONFIG_BT_CONN_CTX=y
# Enable BLE modules
CONFIG_BT_SCAN=y
CONFIG_BT_SCAN_FILTER_ENABLE=y
CONFIG_BT_GATT_DM=y
CONFIG_HEAP_MEM_POOL_SIZE=2048
CONFIG_BT_SCAN_NAME_CNT=15
CONFIG_BT_GATT_DM_DATA_PRINT=y
CONFIG_BT_RX_STACK_SIZE=4096
# Logging
CONFIG_LOG=y
# Handle faults better
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I also saw  RE: MPU Fault when attempting to subscribe to BLE notifications and in my zephyr.map I found that the faulting faulting instruction address (0x20011efc) is

Fullscreen
1
0x0000000020011efc 0x60 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(conn.c.obj)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The surrounding area looks like:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
*fill* 0x0000000020011dde 0x2
.noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/conn.c".2
0x0000000020011de0 0x40 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(conn.c.obj)
.noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/conn.c".1
0x0000000020011e20 0xdb zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(conn.c.obj)
*fill* 0x0000000020011efb 0x1
.noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/conn.c".0
0x0000000020011efc 0x60 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(conn.c.obj)
.noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/att.c".k_mem_slab_buf_att_slab
0x0000000020011f5c 0x294 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(att.c.obj)
0x0000000020011f5c _k_mem_slab_buf_att_slab
.noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/att.c".k_mem_slab_buf_req_slab
0x00000000200121f0 0x54 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(att.c.obj)
0x00000000200121f0 _k_mem_slab_buf_req_slab
*(SORT_BY_ALIGNMENT(.kernel_noinit.*))
0x0000000020012244 _image_ram_end = .
0x0000000020012244 _end = .
0x0000000020040000 __kernel_ram_end = 0x20040000
0x000000000003f408 __kernel_ram_size = (__kernel_ram_end - __kernel_ram_start)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

For reference here is my code

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Kernal
#include <zephyr/kernel.h>
// Bluetooth
#include <bluetooth/conn_ctx.h>
#include <bluetooth/gatt_dm.h>
#include <bluetooth/scan.h>
#include <zephyr/bluetooth/bluetooth.h>
#include <zephyr/bluetooth/conn.h>
#include <zephyr/bluetooth/gatt.h>
// Logging
#include <zephyr/debug/thread_analyzer.h>
#include <zephyr/logging/log.h>
// Clients
#include "ble/sphero_client.h"
/*Note that UUIDs have Least Significant Byte ordering */
#define SPHERO_SERVICE_UUID 0x21, 0x21, 0x6F, 0x72, 0x65, 0x68, 0x70, 0x53, 0x20, 0x4F, 0x4F, 0x57, 0x01, 0x00, 0x01, 0x00
#define BT_SPHERO_SERVICE_UUID BT_UUID_DECLARE_128(SPHERO_SERVICE_UUID)
#define SPHERO_CHARACTERISTIC_UUID 0x21, 0x21, 0x6F, 0x72, 0x65, 0x68, 0x70, 0x53, 0x20, 0x4F, 0x4F, 0x57, 0x02, 0x00, 0x01, 0x00
#define BT_SPHERO_CHARACTERISTIC_UUID BT_UUID_DECLARE_128(SPHERO_CHARACTERISTIC_UUID)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Update 03/08/23

Following  Understanding a FATAL ERROR in my application  tried running the code with the debugger. However it didn't break when the fault occurred, instead I had to manually pause it after seeing the fault be logged.

The call stack is interesting. While the log says the fault occurred in the BT RX thread the call stack says it occured in the BT TX Thread (see image above). When I commend out my bt_gatt_write the error does stop. However I don't know how to fix the error while still being able to write to the characteristics

Parents
  • Make struct bt_gatt_write_params write_params global because the params should be kept valid, it'll solve the problem

Reply
  • Make struct bt_gatt_write_params write_params global because the params should be kept valid, it'll solve the problem

Children
No Data