MPU Fault when attempting to subscribe to BLE notifications

Hi!

I am attempting to configure a nrf5340 dk as a central to receive BLE notifications. The code is based on the central_hr sample. 

The notifications i am attempting to receive are 20 bytes of accelerometer data sent every 1000ms from another development kit and and works fine to read with an iOS app and STM32WB CubeMonRF.

I consistently get this error after gatt_write_ccc is executed(including thread analyzer output):

I have attempted to adjusting the stack size of multiple threads to various amounts with no luck, which has been the solution to similar problems i found.

Modifications from the central_hr sample are: 

- Devices are filtered by name and connects to the device with a matching complete name

- Discovery is skipped, and subscribe_params are defined directly in the code (defined based on this)

prf.conf:

CONFIG_BT=y
CONFIG_BT_DEBUG_LOG=y
CONFIG_BT_CENTRAL=y
CONFIG_BT_SMP=y
CONFIG_BT_GATT_CLIENT=y
#CONFIG_BT_RX_STACK_SIZE=2048

CONFIG_BT_AUTO_PHY_UPDATE=y
CONFIG_BT_AUTO_DATA_LEN_UPDATE=y

CONFIG_BT_BUF_ACL_RX_SIZE=502
CONFIG_BT_ATT_PREPARE_COUNT=2
CONFIG_BT_CONN_TX_MAX=10
CONFIG_BT_L2CAP_TX_BUF_COUNT=10
CONFIG_BT_L2CAP_TX_MTU=498
CONFIG_BT_BUF_ACL_TX_SIZE=502
CONFIG_BT_L2CAP_DYNAMIC_CHANNEL=y

CONFIG_DK_LIBRARY=y
CONFIG_DK_LIBRARY_DYNAMIC_BUTTON_HANDLERS=y

#CONFIG_MAIN_STACK_SIZE=1024
CONFIG_HEAP_MEM_POOL_SIZE=2048

# This example requires more workqueue stack
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048


CONFIG_MAIN_STACK_SIZE=4096
CONFIG_BT_RX_STACK_SIZE=4096
CONFIG_LOG_MODE_MINIMAL=n
CONFIG_LOG_BACKEND_UART=y


# GATT debug messages
CONFIG_BT_DEBUG_GATT=y

CONFIG_THREAD_NAME=y
CONFIG_THREAD_ANALYZER=y
CONFIG_THREAD_ANALYZER_USE_PRINTK=y
CONFIG_THREAD_ANALYZER_AUTO=y
CONFIG_THREAD_ANALYZER_AUTO_INTERVAL=5

hci_rpmsg.conf:

CONFIG_BT_DEBUG_LOG=y
#CONFIG_BT_RX_STACK_SIZE=2048

#CONFIG_MAIN_STACK_SIZE=1024
#CONFIG_HEAP_MEM_POOL_SIZE=2048

# This example requires more workqueue stack
#CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048
#CONFIG_BT_BUF_ACL_TX_SIZE=251
#CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
#CONFIG_BT_BUF_ACL_RX_SIZE=251

#
# Copyright (c) 2021 Nordic Semiconductor
#
# SPDX-License-Identifier: LicenseRef-Nordic-5-Clause
#

# From throughput example
CONFIG_BT_CTLR_SDC_MAX_CONN_EVENT_LEN_DEFAULT=4000000

CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_BT_BUF_ACL_RX_SIZE=502
CONFIG_BT_BUF_ACL_TX_SIZE=502

CONFIG_BT_MAX_CONN=2
CONFIG_BT_RX_STACK_SIZE=8192

CONFIG_THREAD_NAME=y
CONFIG_THREAD_ANALYZER=y
CONFIG_THREAD_ANALYZER_USE_PRINTK=y
CONFIG_THREAD_ANALYZER_AUTO=y
CONFIG_THREAD_ANALYZER_AUTO_INTERVAL=5

Best Regards, 

Tor Egil

  • Hello,

    If there were a stack overflow, it should have been caught by the stack limit checker (in HW) before your program had a chance to try executing code from RAM and trigger the MPU fault. Could you please try to look up the faulting address (0x20005fb8) in your *.map file to see what variable you have in that address range?

    Also, just to confirm, is the subscribe_params struct initialized to zero?

    Best regards,

    Vidar

  • Hi Vidar,

    In my zephyr.map(address at line 15): 

     .noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/long_wq.c".0
                    0x0000000020003550      0x518 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(long_wq.c.obj)
                    0x0000000020003550                bt_lw_stack_area
     .noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c".0
                    0x0000000020003a68     0x1000 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(hci_core.c.obj)
     .noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c".1
                    0x0000000020004a68      0x400 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(hci_core.c.obj)
     .noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/att.c".k_mem_slab_buf_chan_slab
                    0x0000000020004e68      0x150 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(att.c.obj)
                    0x0000000020004e68                _k_mem_slab_buf_chan_slab
     .noinit."WEST_TOPDIR/zephyr/kernel/init.c".0
                    0x0000000020004fb8     0x1000 zephyr/kernel/libkernel.a(init.c.obj)
                    0x0000000020004fb8                z_main_stack
     .noinit."WEST_TOPDIR/zephyr/kernel/init.c".1
                    0x0000000020005fb8      0x140 zephyr/kernel/libkernel.a(init.c.obj)
     .noinit."WEST_TOPDIR/zephyr/kernel/init.c".2
                    0x00000000200060f8      0x800 zephyr/kernel/libkernel.a(init.c.obj)
                    0x00000000200060f8                z_interrupt_stacks
     .noinit."WEST_TOPDIR/zephyr/kernel/system_work_q.c".0
                    0x00000000200068f8      0x800 zephyr/kernel/libkernel.a(system_work_q.c.obj)
     .noinit."WEST_TOPDIR/zephyr/kernel/mempool.c".kheap_buf__system_heap
                    0x00000000200070f8      0x800 zephyr/kernel/libkernel.a(mempool.c.obj)
                    0x00000000200070f8                kheap__system_heap
     .noinit."WEST_TOPDIR/zephyr/subsys/bluetooth/host/buf.c".0

    subscribe_params is initialized to zero and passed to bt_gatt_subscribe as the second screenshot:

    I would also like to note that the BT RX thread is usually at up to 80%-99% CPU in the thread analyze before the MPU fault, but was not in first picture i posted. 

  • Hi Tor Egil,

    Thanks for confirming and for providing the additional information. 0x20005fb8 is pointing to the idle thread's stack if I'm reading the map file correctly, which doesn't make much sense, or not to me, at least. I guess was expecting a more "random" address.

    To troubleshoot this further, I would suggest that you re-build the project with CONFIG_ARM_MPU=n. This will allow code execution from RAM without triggering the MPU fault you saw earlier. You should then be able to place a breakpoint at the faulting RAM address and (hopefully) find out exactly where the branch to this invalid address occurred in the first place. 

    How to set a breakpoint at an arbitrary address in VS code

    teover said:
    I would also like to note that the BT RX thread is usually at up to 80%-99% CPU in the thread analyze before the MPU fault, but was not in first picture i posted. 

    It seems a bit high, but I haven't tried to profile the CPU usage in this scenario, so I'm not sure if it is to be expected or not. 

  • Takk Vidar!

    This helped me trace the faulting code, which was an incorrectly defined exchange_mtu function.

Related