This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Register R7 not being correctly preserved / possible bug in nrf_svc.h

I have a fairly mature product based on the nrf51422_aa with soft device s310 version one using the GCC toolchain. I recently ported the design to the latest SDK (9.0)and soft device (s310 Version 3) and ac variant of the processor. I have run into a perplexing bug.

When I did the port, I updated my toolchain to the same one reference in "Makefile.Windows" in the sdk, "4.9 2015q1". After completing the port, I had a repeatable hard fault that I could trigger with a particular BLE command.

The hard fault was caused by an illegal memory access past the end of RAM. The illegal memory access was caused by something corrupting the contents of register R7. Either I have wild pointers corrupting the stack or something is not obeying the ARM calling conventions to preserve register R7.

Now I have discovered if I switch the toolchain from "4.9 2015q1" to "4.8 2014q3" that I used for most of my other development, that hard fault goes away because the compiler puts a critical variable in R6, not R7. Doubt it is a compiler bug but possible I suppose. In limited testing, the code appears to behave perfectly with old compiler but repeatably crashes with new compiler.

So I tried going back and recompiling previous code that had been thoroughly tested and rolled out to customers with the "4.9 2015q1" compiler. That code did not hard fault but it immediately started misbehaving on the simplest tests.

In the course of troubleshooting, a coworker pointed out a potential issue with nrf_svc.h when used with gcc. Line 52 uses the "naked" keyword:

#ifdef SVCALL_AS_NORMAL_FUNCTION
#define SVCALL(number, return_type, signature) return_type signature
#else

#ifndef SVCALL
#if defined (__CC_ARM)
#define SVCALL(number, return_type, signature) return_type __svc(number) signature
#elif defined (__GNUC__)
#define SVCALL(number, return_type, signature) \
  _Pragma("GCC diagnostic ignored \"-Wunused-function\"") \
  _Pragma("GCC diagnostic push") \
  _Pragma("GCC diagnostic ignored \"-Wreturn-type\"") \
  __attribute__((naked)) static return_type signature \
  { \
    __asm( \
        "svc %0\n" \
        "bx r14" : : "I" (number) : "r0" \
    ); \
  }    \
  _Pragma("GCC diagnostic pop")
#elif defined (__ICCARM__)

According to gcc documentation:

This attribute allows the compiler to construct the requisite function declaration, while allowing the body of the function to be assembly code. The specified function will not have prologue/epilogue sequences generated by the compiler. Only basic asm statements can safely be included in naked functions (see Basic Asm). While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.

And the colons in the assembly indicate extended assembly is being used (gcc docs on extended asm)

It appears to me that Nordic is violating the explicit warning about the naked keyword. Is that a bug or is there another explanation?

Any ideas on how to isolate my root issue with R7 corruption / compiler sensitivity?

Parents
  • That code's fine. The warning is really telling you you're on your own if you do this, the extended syntax tells the compiler that you're clobbering r0 but it's not going to emit a prolog to preserve it, which it wouldn't do anyway as r0 isn't required to be preserved. Nor does r7 need preserving as the svc call stacks that. If that code, which is used everywhere, ruined the registers, not much would work.

    So the problem must be elsewhere. Does the code which uses r7 where it used to use r6 preserve it properly and restore it properly? If you know where this is crashing you can put a breakpoint in at that return and check the stack to see what's just about to be restored to r7.

    I'd also take a look at any stack variables in that routine and ensure I'm not writing one beyond an array limit or something, it's possible that the new compiler reorders the variables on the stack and now the stored contents of r7 is in harms way.

    And your stack's large enough right? I get caught out by that on occasion, especially as my build environment unwisely puts the stack right after the heap, which I ought to fix.

Reply
  • That code's fine. The warning is really telling you you're on your own if you do this, the extended syntax tells the compiler that you're clobbering r0 but it's not going to emit a prolog to preserve it, which it wouldn't do anyway as r0 isn't required to be preserved. Nor does r7 need preserving as the svc call stacks that. If that code, which is used everywhere, ruined the registers, not much would work.

    So the problem must be elsewhere. Does the code which uses r7 where it used to use r6 preserve it properly and restore it properly? If you know where this is crashing you can put a breakpoint in at that return and check the stack to see what's just about to be restored to r7.

    I'd also take a look at any stack variables in that routine and ensure I'm not writing one beyond an array limit or something, it's possible that the new compiler reorders the variables on the stack and now the stored contents of r7 is in harms way.

    And your stack's large enough right? I get caught out by that on occasion, especially as my build environment unwisely puts the stack right after the heap, which I ought to fix.

Children
No Data
Related