This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Application FW register stomped by nrf_oberon

Hi we are seeing some very strange and would like some help clearing it up.

Recently we noticed a very weird hardfault issue triggered through data bus error, at first it appeared as stack overflow issue. But upon further examination we found that the variable that is getting corrupted was stored in S16 VFP register. The disassembled function look something like the following.

void initial_enable_func(struct context*c, struct data1*d1, struct data2 d2, struct data3 d3) {

0: stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
4: vpush {d8}
8: vmov s16, r3
c: ldr r3, [pc, #700] 

.....

28b28: vmov r1, s16
28b2c: mov r0, r4
28b2e: bl 2887c (Calling function here with d3, saved in R3)

.....

28bf8: add sp, #12
28bfa: vpop {d8}
28bfe: ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}
28c02: nop

From what we can judge this looks like the correct compiler output, despite a little unorthodox using S16 as temp register. And according to the arm specification here https://developer.arm.com/documentation/ihi0042/j/?lang=en#vfp-and-simd-vector-register-arguments

Registers s16-s31 (d8-d15, q4-q7) must be preserved across subroutine calls; registers s0-s15 (d0-d7, q0-q3) do not need to be preserved (and can be used for passing arguments or returning results in standard procedure-call variants). Registers d16-d31 (q8-q15), if present, do not need to be preserved.

So with this in mind, we did a little binary search on where S16 register being corrupted using the following inline assembly pattern:

uintptr_t dummy = 0x12345678;
__ASM volatile("vmov s16, %0" ::"r"(dummy));

....code....

__ASM volatile("vmov %0, s16" : "=r"(dummy));
if (dummy == 0x12345678) {
panic();
}

we were quickly able to narrow it to the nrf_oberon library via nrf_ble_lesc_key_keypair_generate(), which is triggered through pm_init(). Our FW uses the peer_manager component with LESC enabled. And we use the nrf_oberon as the crypto backend.

A quick disassembly of liboberon_3.0.1.a does indeed show a few function that uses S16 without properly retain it's state across the function call. This looks like a bug or incompatibility with liboberon. It would be great if we can be provided the how liboberon3.0.1 is compiled such as compile options and version.

Have anyone else run into this issue? What is the recommended solution? It would be great for someone to confirm our findings to make sure it is sane.

FWIW we are compiling our FW with following options:

-mcpu=cortex-m4 -mfloat-abi=hard -mthumb -mabi=aapcs -mfpu=fpv4-sp-d16

Using arm-none-eabi-gcc version 9-2020-q2-update 9.3.1

Running NRFSDK 16, with liboberon 3.0.1

Related