I have a fairly mature product based on the nrf51422_aa with soft device s310 version one using the GCC toolchain. I recently ported the design to the latest SDK (9.0)and soft device (s310 Version 3) and ac variant of the processor. I have run into a perplexing bug.
When I did the port, I updated my toolchain to the same one reference in "Makefile.Windows" in the sdk, "4.9 2015q1". After completing the port, I had a repeatable hard fault that I could trigger with a particular BLE command.
The hard fault was caused by an illegal memory access past the end of RAM. The illegal memory access was caused by something corrupting the contents of register R7. Either I have wild pointers corrupting the stack or something is not obeying the ARM calling conventions to preserve register R7.
Now I have discovered if I switch the toolchain from "4.9 2015q1" to "4.8 2014q3" that I used for most of my other development, that hard fault goes away because the compiler puts a critical variable in R6, not R7. Doubt it is a compiler bug but possible I suppose. In limited testing, the code appears to behave perfectly with old compiler but repeatably crashes with new compiler.
So I tried going back and recompiling previous code that had been thoroughly tested and rolled out to customers with the "4.9 2015q1" compiler. That code did not hard fault but it immediately started misbehaving on the simplest tests.
In the course of troubleshooting, a coworker pointed out a potential issue with nrf_svc.h when used with gcc. Line 52 uses the "naked" keyword:
#ifdef SVCALL_AS_NORMAL_FUNCTION
#define SVCALL(number, return_type, signature) return_type signature
#else
#ifndef SVCALL
#if defined (__CC_ARM)
#define SVCALL(number, return_type, signature) return_type __svc(number) signature
#elif defined (__GNUC__)
#define SVCALL(number, return_type, signature) \
_Pragma("GCC diagnostic ignored \"-Wunused-function\"") \
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wreturn-type\"") \
__attribute__((naked)) static return_type signature \
{ \
__asm( \
"svc %0\n" \
"bx r14" : : "I" (number) : "r0" \
); \
} \
_Pragma("GCC diagnostic pop")
#elif defined (__ICCARM__)
According to gcc documentation:
This attribute allows the compiler to construct the requisite function declaration, while allowing the body of the function to be assembly code. The specified function will not have prologue/epilogue sequences generated by the compiler. Only basic asm statements can safely be included in naked functions (see Basic Asm). While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.
And the colons in the assembly indicate extended assembly is being used (gcc docs on extended asm)
It appears to me that Nordic is violating the explicit warning about the naked keyword. Is that a bug or is there another explanation?
Any ideas on how to isolate my root issue with R7 corruption / compiler sensitivity?