MPU Fault with NCS Zephyr on nrf52840s

I'm getting the following intermittent MPU fault when reading data over BLE. 

Both the BLE transmitter and receiver are using an nrf52840, though the receiver is a Laird MG100.

[01:39:56.263,153] <err> os: mem_manage_fault: ***** MPU FAULT *****
[01:39:56.270,446] <err> os: mem_manage_fault:   Data Access Violation
[01:39:56.277,923] <err> os: mem_manage_fault:   MMFAR Address: 0x0
[01:39:56.285,156] <err> os: esf_dump: r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0x00000040
[01:39:56.295,074] <err> os: esf_dump: r3/a4:  0x20000b14 r12/ip:  0x200010a0 r14/lr:  0x00061195
[01:39:56.304,992] <err> os: esf_dump:  xpsr:  0x610f0000
[01:39:56.311,401] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00015924
[01:39:56.320,495] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[01:39:56.329,925] <err> os: z_fatal_error: Current thread: 0x20003cd0 (main)
[01:39:56.338,073] <err> fatal_error: k_sys_fatal_error_handler: Resetting system

I'm using Zephyr OS build v3.2.99-ncs2.

I've traced r14/lr to `/nordic/v2.3.0/zephyr/kernel/sched.c:884`:

struct k_thread *z_unpend_first_thread(_wait_q_t *wait_q)
{
	struct k_thread *thread = NULL;

	LOCKED(&sched_spinlock) {
		thread = _priq_wait_best(&wait_q->waitq);

		if (thread != NULL) {                  <---------- THIS IS THE OFFENDING LINE
			unpend_thread_no_timeout(thread);
			(void)z_abort_thread_timeout(thread);
		}
	}

	return thread;
}


I've traced the faulting instruction address (r15/pc) to `/nordic\toolchains\v2.3.0\opt\zephyr-sdk\arm-zephyr-eabi\arm-zephyr-eabi\sys-include\ssp/string.h:86`:
...
__ssp_bos_icheck3(memset, void *, int)
...


I'm unsure how to further debug this, as this appears to pertain to code working at a much lower level than I'm really used to.

Can anyone provide guidance?

Parents Reply
  • Hi,

    We recently changed settings related to TLS connections, and this seems to have changed the nature of the problem. 

    I no longer get MPU Faults pointing to sched.c. Instead, the issue now appears to emanate from libc-hooks.c and uart_nrfx_uart.c. This new issue is regular: it repeats at roughly the same interval as the new issue, but the offending line has been consistently the same over multiple tests.

    I'm thinking that the original issue can be put on hold while I investigate the new issue.

    I'm wondering if I should create a new ticket?

    With thanks,

    S.

Children
Related