This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

inaccurate gcc nrf_delay_us

There are a couple problems with the SDK 7.1.0 implementation of nrf_delay_us for GCC. The experiments I've run show the provided version generates delays 40-50% too long, as measured by before/after captures of TIMER0 running with undivided HFCLK.

First, the "static inline" technique does not guarantee inline on gcc. Inlining is critical for the intended delay to be exact. You need to force GNU inline semantics, and add an attribute that makes GCC inline even when not optimizing. (Below I've done that in a way that works with -std=c99.)

Second, implementing the loop control in C instead of assembly also makes the timing dependent on optimization levels.

Third, there are two too many NOPs in the loop body, compared to the other assembly variants.

The code below generates exact delays for me using gcc-arm-none-eabi-4_9-2014q4 for power-of-two (1..2048) delays. There is a constant 7 clock overhead, which probably includes triggering a capture.

extern void inline
__attribute__((__gnu_inline__,__always_inline__))
nrf_delay_us(uint32_t volatile number_of_us)
{
    __ASM volatile (
        "1:\tSUB %0, %0, #1\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "BNE 1b\n\t"
        : "+r"(number_of_us)
    );
}
Parents
  • Yeah, I had the same problem, this is what I use:

    /* The above version's run time depends on 
     * the mood of the compiler. 
     */
    __ASM volatile(
        ".syntax unified\n"
        "1:\n"
        " SUBS %[delay], %[delay], #1\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"   
        " NOP\n"  
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " BNE 1b\n" // 3 if taken, 1 if not  
        ".syntax divided\n"  
        : [delay] "+r" (number_of_us));
    
  • Thanks. Maybe my implementation is not optimal, I also tried the methods you mentioned, by using delay function is just a simple way to make the protocol can work. From my test, the most time-consuming part is the instruction "if ...". For PPI, since for my case the same GPIO pin is used for TX and RX (TDD), I'm not sure if I can connect 2 events (read and write) to the same pin (I'm trying to work on this). Another point is that I also need control the time between read and write switch, so I try to use "-O3" option to reduce the code (between read and write switch) running time.

Reply
  • Thanks. Maybe my implementation is not optimal, I also tried the methods you mentioned, by using delay function is just a simple way to make the protocol can work. From my test, the most time-consuming part is the instruction "if ...". For PPI, since for my case the same GPIO pin is used for TX and RX (TDD), I'm not sure if I can connect 2 events (read and write) to the same pin (I'm trying to work on this). Another point is that I also need control the time between read and write switch, so I try to use "-O3" option to reduce the code (between read and write switch) running time.

Children
No Data
Related