This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Using Clang and the S110 (issues with supervisor calls to the SoftDevice)

Hi everyone,

I've been experimenting compiling my project with Clang and have run into some issues with the supervisor calls and their wrapper in nrf_svc.h.

Without optimizations, the calls seem to work fine. See the following asm generated by Clang at -O0 that loads r0 and r1 with the parameters for the SVC:

/* uint32_t  rc = sd_softdevice_enable(NRF_CLOCK_LFCLKSRC_XTAL_75_PPM, assertion_handler); */
ldr r1, 0x000188A8 // second parameter <= &assertion_handler()
movs r0, #5 // first parameter <= LFCLK configuration
bl 0x000188B8 <sd_softdevice_enable>
ldr r1, [sp, #16]  // Put address of 'rc' into r1
str r0, [r1] // Store return value from SVC into 'rc'

/* ASSERT(rc == NRF_SUCCESS); */
cmp r0, #0 // Check 'rc == NRF_SUCCESS'
beq 0x00018810 // All good, don't assert
b 0x00018802 // Assert, rc != NRF_SUCCESS

<sd_softdevice_enable>
add r2, sp, #8
mov r3, r1
mov r4, r0
str r0, [r2, #4]
str r1, [r2]

svc #16
bx lr

There are some unnecessary moves and stores in <sd_softdevice_enable> (since everything is already in a register), but it appears to work.

To contrast, here's equivalent asm from GCC at -O2 (much more concise):

/* uint32_t rc = sd_softdevice_enable(NRF_CLOCK_LFCLKSRC_XTAL_75_PPM, assertion_handler); */
movs r0, #5 // first parameter <= LFCLK configuration
ldr r1, 0x00018938 // second parameter <= &assertion_handler()
bl 0x000187BC <sd_softdevice_enable>

/* ASSERT(rc == NRF_SUCCESS); */
cmp r0, #0 // Check 'rc == NRF_SUCCESS'
bne 0x00018928 // Assert if !=

<sd_softdevice_enable>
svc #16
bx lr

And here's Clang again at -O2. Everything breaks down.

/* uint32_t rc = sd_softdevice_enable( NRF_CLOCK_LFCLKSRC_XTAL_75_PPM, assertion_handler); */
bl 0x0001889C <sd_softdevice_enable> // Where are the parameter loads into r0 and r1?

// Oh no! It unconditionally asserts!
/* ASSERT(rc == NRF_SUCCESS); */
 ldr r4, 0x0001888C
 ldr r5, 0x00018890
 movs r2, #0x72
 mov r0, r4
 mov r1, r5
 bl 0x000254F4 <uart_fatalOut>
 bkpt #0 // R.I.P.

<sd_softdevice_enable> // Same as the previous two
svc #16
bx lr

Summarized, here's what I see at Clang -O2:

  1. No loading of r0, r1 of the input parameters to the SVC
  2. No capturing of the return value from sd_softdevice_enable

It's as if Clang isn't following the calling convention.

Here's my hypothesis of why. The following is from nrf_svc.h (just the relevant part):

#define SVCALL(number, return_type, signature) \
__attribute__((naked)) static return_type signature \
{ \
    __asm( \
        "svc %0\n" \
        "bx r14" : : "I" (number) : "r0" \
    ); \
}

Nordic uses the preprocessor macro SVCALL to define all the sd_* functions implemented as supervisor calls. Here's the definition of sd_softdevice_enable after preprocessing and some improved formatting:

 __attribute__((naked)) static uint32_t sd_softdevice_enable(
    nrf_clock_lfclksrc_t clock_source,
    softdevice_assertion_handler_t assertion_handler)
{
    __asm( 
        "svc %0 \t\n
         bx r14 \t\n"
         : : "I" (SD_SOFTDEVICE_ENABLE) : "r0"
    );
}

Maybe I'm wrong, but I understand that using extended inline assembly is undefined in a naked function. Only basic inline asm, without output:input:clobber lists is allowed. GCC's docs say so here.

Only basic asm statements can safely be included in naked functions (see Basic Asm). While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.

It seems to me that GCC does the "right" thing, and Clang doesn't, however relying on GCC to always do the right thing isn't dependable.

I'm not very experienced at this low-level, but I think that using standard C functions (dropping the static and the naked attribute) might be the solution.

Of course, I could be misinterpreting the problem. I'm interested in hearing any comments to that effect.

Thanks!

  • It's odd that there's a movs, but then not any subsequent conditional instructions. Just to get a little more info, can you try running it through clang (-O2) again without the ASSERT? Also, can you provide all the parameters your passing to clang?

  • Without the ASSERT the assembly is a branch with link to <sd_softdevice_enable> (wrong values in r0, r1):

    /* uint32_t rc = sd_softdevice_enable( NRF_CLOCK_LFCLKSRC_XTAL_75_PPM, assertion_handler); */
    bl 0x00018890 <sd_softdevice_enable>
    

    I'm compiling via Rowley Crossworks. Here are the parameters to Clang:

    -cc1 -S -disable-free -disable-llvm-verifier -mrelocation-model static -fmath-errno -mconstructor-aliases -x c -fno-caret-diagnostics -fno-diagnostics-fixit-info -triple thumbv6m-none--eabi -target-cpu cortex-m0 -target-feature +soft-float -target-feature +soft-float-abi -msoft-float -target-abi aapcs -mfloat-abi soft -nostdsysteminc -nobuiltininc -sys-header-deps -MT -Wall -fms-extensions -Wshadow -Wunreachable-code -g -std=gnu99 -O2 -mdisable-fp-elim -fno-dwarf-directory-asm -fno-builtin -ffunction-sections -fdata-sections -mno-global-merge -fshort-enums -fno-common

  • EDITED --- See additions / updates at bottom.

    Here's what I think is going on:

    Since the SVCALL macro defines the service calls as "static" functions, they are all within the same compilation unit as the calls to them. Consequently the optimizer sees both the call and that the actual function isn't using the passed arguments, so it eliminates the unused assignments to the argument registers.

    I've taken a quick look at both clang's function attributes and the inline assembly syntax. I haven't yet spotted a way to indicate input register dependencies. I don't think "naked" really implies anything about what registers are being used for input, so I'm not sure I'd consider it a flaw in clang.

    You can provoke a different behavior by changing the SVCALL macro process. I did some quick tests with:

    A) I made SVCALL in nrf_svc.h into a typical function declaration, like:

    #define SVCALL(number, return_type, signature) return_type signature
    

    rather than a definition of a static function. This will prevent the functions from actually being defined anywhere. So,:

    B) (Update: The generated code destroys the stack) I exposed the original SVCALL macro without the "static" designation in softdevice_handler.c. This creates a single instance of all the service calls.

    I didn't actually test the resulting code, so there may be other issues. I did do a cursory review of the assembly language though. (Update: As RK mentioned, clang is destroying the stack)

    If anyone can shed any further light on the situation it would certainly be appreciated.

    And, on a related note, is there a significant advantage to having the SVCALL functions be static? (Update: See RK's answer)

    UPDATE:

    After a bit more thought I tried two other approaches. I don't particularly like either, but they each may lead to working code. The first approach is using #defines for each service call, like:

    #define sd_softdevice_enable(a,b)  \
          ({int retVal; \
            register nrf_clock_lfclksrc_t _a asm("r0") = (a); \ 
            register softdevice_assertion_handler_t _b asm("r1") = (b); \
      asm volatile ( \
           "svc %[i]\n\t"      \
           "mov %[result],r0\n\t" \
           : [result] "=r" (retVal)  \
           :[i] "i" (SD_SOFTDEVICE_ENABLE), [first] "r" (_a) , [second] "r" (_b) \
           : "r0"); retVal;})
    

    I only did that one test case, which generated acceptable code. Unfortunately, I don't think there's a convenient way to work this into the SVCALL macro mechanism, so it's difficult to support. I.e., there would have to be separate clang-compatible versions of all the headers that call services or some way to generate the code beyond the normal preprocessor.

    Because of the effort involved, I didn't pursue that approach beyond a proof-of-concept.

    My second approach uses several SVCALL macros (one for calls with no parameters, one for calls with one parameter, etc.), like:

    #define SVCALL2(number, rettype, signature, p1, p2)  \
    __attribute__((always_inline)) static inline rettype signature {  \
        register typeof(p1) _p1 asm("r0") = p1;         \
        register typeof(p2) _p2 asm("r1") = p2;         \
        register rettype retVal;                        \
        asm volatile (                                  \
           "svc %[i]\n\t"                               \
           "mov %[result],r0\n\t"                       \
           : [result] "=r" (retVal)                     \
           :[i] "i" (number), [a] "r" (_p1), [b] "r" (_p2) \
           : "r0");                                     \
        return retVal;                                  \
    

    I also use variadic macros to make the use somewhat transparent. So it's still possible to use slightly modified SVCALL macros and switch between multiple compilers (gcc, clang). An updated SVCALL declaration looks like:

    SVCALL(SD_SOFTDEVICE_ENABLE, uint32_t, sd_softdevice_enable(nrf_clock_lfclksrc_t clock_source, softdevice_assertion_handler_t assertion_handler),clock_source, assertion_handler);
    

    Since sd_softdevice_enable() takes two parameters, the SVCALL has two extra parameters (compared to the original Nordic version), which are just the names of sd_softdevice_enable's parameters (i.e., clock_source, assertion_handler). This allows the generated code to create register variables of identical type.

    I did completely implement this approach (albeit hastily). Of course, I don't have a license for CrossWorks and can't easily test out the resulting code. If someone wants to try it, I can send or post the modified files (nrf_svc.h, nrf_soc.h, nrf_sdm.h, ble_l2cap.h, ble_gatts.h, ble_gattc.h, ble_gap.h, ble.h). nrf_svc.h is the most important and is attached.

    I doubt I'll pursue this much further, but I thought I'd share my progress so far. Here are some of my remaining concerns:

    1. Are the register assignments above rigidly followed by clang?
    2. Did I get the in-line ASM right and/or can it be improved?
    3. Are there better ways to handle the macros?
    4. Does it behave properly at different optimization levels?
    5. Is there any collateral damage?

    Here's the final attempt at nrf_svc.h, which seems to produce code that is comparable to gcc and hand written assembly.

  • I went through this 9 months or so ago and discovered much the same thing. The reply from Bill is exactly right in that clang optimises out the parameters it thinks are unused. I also found no good way to stop it doing that and I tried just about every flag I could find. I did however find some other bugs in clang (one of which had been reported) which put incorrect preamble/postamble code in when either naked or one of the other options was used and stomped on the return value. The whole thing wasn't very clang-friendly. I don't know if any of those bugs have been fixed in the last year, certainly that assembly code for debug mode looks a little better than it did before, specifically there used to be a bx lr in the code before the stack cleanup at -O0 which ruined the stack.

    I did get it working with a load of hand-assembly which wasn't terribly hard to write, however the way SVCCALL is defined made it somewhere between hard and impossible to automate, or I certainly didn't manage. I seem to remember it would have been possible if the signature were split up a little more in the macro .. I don't recall at this point and I tossed the code as being generally unstable and doing different things in different debug modes. I also recall having trouble interpolating #define values (i.e. the numbers of the service calls) into assembly without using the __asm declaration and with it I ran right back into the preamble/postamble/naked issues.

    I think I had one other rather scuzzy version which defined all the possible SVC calls as functions and created a function name as part of the SVCCALL macro but that was horrible too.

    I just found the mail I originally sent to Rowley (Crossworks) about this asking if they had a Clang-compatible piece of code I could use, they never quite managed to get me one and I gave up. Here's what I wrote back then

    I'm having issues compiling some of the nordic code with clang, specifically anything which calls their softdevice for which it uses a SVC call.

    The SVC call functions are macro expanded into a GCC format such that this is what actually gets compiled for a two argument function test() which takes two parameters.

    __attribute__((naked)) uint32_t test( uint32_t A, uint32_t B )

    {

    __asm( "svc %0\n" "bx r14" : : "I" ( TEST_SVC_NUMBER ) : "r0" )

    }

    That suffers from several problems. Clang at -O0 doesn't quite respect 'naked' and puts a preamble/postamble on the function which, because of the bx r14 then breaks the stack because it returns before it clean up. Even if I remove that, r0 is borked because I think clang doesn't know svc leaves an answer in it and uses it for something else.

    Worse, when you compile it with any kind of optimisation clang thinks the arguments aren't used and doesn't actually worry about what's in r0 or r1 when it calls it, so it just sends random garbage.

    Do you have a clang-worthy example of something likes this? SVC calls seem pretty standard in ARM and I thought you may have come across it.

  • Nice analysis Bill, as per my answer (not really an answer) below, I found much the same thing.

    I assume, to answer your final question, that defining the calls as static allows the compiler to inline them right into the code thus saving a branch and stack before a SVC call which is going to branch and stack again. That at any reasonable level of optimisation is what it appears to do. The code for those SVC calls is so short in assembly that inlining it is a good thing if you can do it. That was my guess anyway.

Related