This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Using Clang and the S110 (issues with supervisor calls to the SoftDevice)

Hi everyone,

I've been experimenting compiling my project with Clang and have run into some issues with the supervisor calls and their wrapper in nrf_svc.h.

Without optimizations, the calls seem to work fine. See the following asm generated by Clang at -O0 that loads r0 and r1 with the parameters for the SVC:

/* uint32_t  rc = sd_softdevice_enable(NRF_CLOCK_LFCLKSRC_XTAL_75_PPM, assertion_handler); */
ldr r1, 0x000188A8 // second parameter <= &assertion_handler()
movs r0, #5 // first parameter <= LFCLK configuration
bl 0x000188B8 <sd_softdevice_enable>
ldr r1, [sp, #16]  // Put address of 'rc' into r1
str r0, [r1] // Store return value from SVC into 'rc'

/* ASSERT(rc == NRF_SUCCESS); */
cmp r0, #0 // Check 'rc == NRF_SUCCESS'
beq 0x00018810 // All good, don't assert
b 0x00018802 // Assert, rc != NRF_SUCCESS

<sd_softdevice_enable>
add r2, sp, #8
mov r3, r1
mov r4, r0
str r0, [r2, #4]
str r1, [r2]

svc #16
bx lr

There are some unnecessary moves and stores in <sd_softdevice_enable> (since everything is already in a register), but it appears to work.

To contrast, here's equivalent asm from GCC at -O2 (much more concise):

/* uint32_t rc = sd_softdevice_enable(NRF_CLOCK_LFCLKSRC_XTAL_75_PPM, assertion_handler); */
movs r0, #5 // first parameter <= LFCLK configuration
ldr r1, 0x00018938 // second parameter <= &assertion_handler()
bl 0x000187BC <sd_softdevice_enable>

/* ASSERT(rc == NRF_SUCCESS); */
cmp r0, #0 // Check 'rc == NRF_SUCCESS'
bne 0x00018928 // Assert if !=

<sd_softdevice_enable>
svc #16
bx lr

And here's Clang again at -O2. Everything breaks down.

/* uint32_t rc = sd_softdevice_enable( NRF_CLOCK_LFCLKSRC_XTAL_75_PPM, assertion_handler); */
bl 0x0001889C <sd_softdevice_enable> // Where are the parameter loads into r0 and r1?

// Oh no! It unconditionally asserts!
/* ASSERT(rc == NRF_SUCCESS); */
 ldr r4, 0x0001888C
 ldr r5, 0x00018890
 movs r2, #0x72
 mov r0, r4
 mov r1, r5
 bl 0x000254F4 <uart_fatalOut>
 bkpt #0 // R.I.P.

<sd_softdevice_enable> // Same as the previous two
svc #16
bx lr

Summarized, here's what I see at Clang -O2:

  1. No loading of r0, r1 of the input parameters to the SVC
  2. No capturing of the return value from sd_softdevice_enable

It's as if Clang isn't following the calling convention.

Here's my hypothesis of why. The following is from nrf_svc.h (just the relevant part):

#define SVCALL(number, return_type, signature) \
__attribute__((naked)) static return_type signature \
{ \
    __asm( \
        "svc %0\n" \
        "bx r14" : : "I" (number) : "r0" \
    ); \
}

Nordic uses the preprocessor macro SVCALL to define all the sd_* functions implemented as supervisor calls. Here's the definition of sd_softdevice_enable after preprocessing and some improved formatting:

 __attribute__((naked)) static uint32_t sd_softdevice_enable(
    nrf_clock_lfclksrc_t clock_source,
    softdevice_assertion_handler_t assertion_handler)
{
    __asm( 
        "svc %0 \t\n
         bx r14 \t\n"
         : : "I" (SD_SOFTDEVICE_ENABLE) : "r0"
    );
}

Maybe I'm wrong, but I understand that using extended inline assembly is undefined in a naked function. Only basic inline asm, without output:input:clobber lists is allowed. GCC's docs say so here.

Only basic asm statements can safely be included in naked functions (see Basic Asm). While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.

It seems to me that GCC does the "right" thing, and Clang doesn't, however relying on GCC to always do the right thing isn't dependable.

I'm not very experienced at this low-level, but I think that using standard C functions (dropping the static and the naked attribute) might be the solution.

Of course, I could be misinterpreting the problem. I'm interested in hearing any comments to that effect.

Thanks!

Parents
  • EDITED --- See additions / updates at bottom.

    Here's what I think is going on:

    Since the SVCALL macro defines the service calls as "static" functions, they are all within the same compilation unit as the calls to them. Consequently the optimizer sees both the call and that the actual function isn't using the passed arguments, so it eliminates the unused assignments to the argument registers.

    I've taken a quick look at both clang's function attributes and the inline assembly syntax. I haven't yet spotted a way to indicate input register dependencies. I don't think "naked" really implies anything about what registers are being used for input, so I'm not sure I'd consider it a flaw in clang.

    You can provoke a different behavior by changing the SVCALL macro process. I did some quick tests with:

    A) I made SVCALL in nrf_svc.h into a typical function declaration, like:

    #define SVCALL(number, return_type, signature) return_type signature
    

    rather than a definition of a static function. This will prevent the functions from actually being defined anywhere. So,:

    B) (Update: The generated code destroys the stack) I exposed the original SVCALL macro without the "static" designation in softdevice_handler.c. This creates a single instance of all the service calls.

    I didn't actually test the resulting code, so there may be other issues. I did do a cursory review of the assembly language though. (Update: As RK mentioned, clang is destroying the stack)

    If anyone can shed any further light on the situation it would certainly be appreciated.

    And, on a related note, is there a significant advantage to having the SVCALL functions be static? (Update: See RK's answer)

    UPDATE:

    After a bit more thought I tried two other approaches. I don't particularly like either, but they each may lead to working code. The first approach is using #defines for each service call, like:

    #define sd_softdevice_enable(a,b)  \
          ({int retVal; \
            register nrf_clock_lfclksrc_t _a asm("r0") = (a); \ 
            register softdevice_assertion_handler_t _b asm("r1") = (b); \
      asm volatile ( \
           "svc %[i]\n\t"      \
           "mov %[result],r0\n\t" \
           : [result] "=r" (retVal)  \
           :[i] "i" (SD_SOFTDEVICE_ENABLE), [first] "r" (_a) , [second] "r" (_b) \
           : "r0"); retVal;})
    

    I only did that one test case, which generated acceptable code. Unfortunately, I don't think there's a convenient way to work this into the SVCALL macro mechanism, so it's difficult to support. I.e., there would have to be separate clang-compatible versions of all the headers that call services or some way to generate the code beyond the normal preprocessor.

    Because of the effort involved, I didn't pursue that approach beyond a proof-of-concept.

    My second approach uses several SVCALL macros (one for calls with no parameters, one for calls with one parameter, etc.), like:

    #define SVCALL2(number, rettype, signature, p1, p2)  \
    __attribute__((always_inline)) static inline rettype signature {  \
        register typeof(p1) _p1 asm("r0") = p1;         \
        register typeof(p2) _p2 asm("r1") = p2;         \
        register rettype retVal;                        \
        asm volatile (                                  \
           "svc %[i]\n\t"                               \
           "mov %[result],r0\n\t"                       \
           : [result] "=r" (retVal)                     \
           :[i] "i" (number), [a] "r" (_p1), [b] "r" (_p2) \
           : "r0");                                     \
        return retVal;                                  \
    

    I also use variadic macros to make the use somewhat transparent. So it's still possible to use slightly modified SVCALL macros and switch between multiple compilers (gcc, clang). An updated SVCALL declaration looks like:

    SVCALL(SD_SOFTDEVICE_ENABLE, uint32_t, sd_softdevice_enable(nrf_clock_lfclksrc_t clock_source, softdevice_assertion_handler_t assertion_handler),clock_source, assertion_handler);
    

    Since sd_softdevice_enable() takes two parameters, the SVCALL has two extra parameters (compared to the original Nordic version), which are just the names of sd_softdevice_enable's parameters (i.e., clock_source, assertion_handler). This allows the generated code to create register variables of identical type.

    I did completely implement this approach (albeit hastily). Of course, I don't have a license for CrossWorks and can't easily test out the resulting code. If someone wants to try it, I can send or post the modified files (nrf_svc.h, nrf_soc.h, nrf_sdm.h, ble_l2cap.h, ble_gatts.h, ble_gattc.h, ble_gap.h, ble.h). nrf_svc.h is the most important and is attached.

    I doubt I'll pursue this much further, but I thought I'd share my progress so far. Here are some of my remaining concerns:

    1. Are the register assignments above rigidly followed by clang?
    2. Did I get the in-line ASM right and/or can it be improved?
    3. Are there better ways to handle the macros?
    4. Does it behave properly at different optimization levels?
    5. Is there any collateral damage?

    Here's the final attempt at nrf_svc.h, which seems to produce code that is comparable to gcc and hand written assembly.

  • Nice analysis Bill, as per my answer (not really an answer) below, I found much the same thing.

    I assume, to answer your final question, that defining the calls as static allows the compiler to inline them right into the code thus saving a branch and stack before a SVC call which is going to branch and stack again. That at any reasonable level of optimisation is what it appears to do. The code for those SVC calls is so short in assembly that inlining it is a good thing if you can do it. That was my guess anyway.

Reply
  • Nice analysis Bill, as per my answer (not really an answer) below, I found much the same thing.

    I assume, to answer your final question, that defining the calls as static allows the compiler to inline them right into the code thus saving a branch and stack before a SVC call which is going to branch and stack again. That at any reasonable level of optimisation is what it appears to do. The code for those SVC calls is so short in assembly that inlining it is a good thing if you can do it. That was my guess anyway.

Children
No Data
Related