This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

inaccurate gcc nrf_delay_us

There are a couple problems with the SDK 7.1.0 implementation of nrf_delay_us for GCC. The experiments I've run show the provided version generates delays 40-50% too long, as measured by before/after captures of TIMER0 running with undivided HFCLK.

First, the "static inline" technique does not guarantee inline on gcc. Inlining is critical for the intended delay to be exact. You need to force GNU inline semantics, and add an attribute that makes GCC inline even when not optimizing. (Below I've done that in a way that works with -std=c99.)

Second, implementing the loop control in C instead of assembly also makes the timing dependent on optimization levels.

Third, there are two too many NOPs in the loop body, compared to the other assembly variants.

The code below generates exact delays for me using gcc-arm-none-eabi-4_9-2014q4 for power-of-two (1..2048) delays. There is a constant 7 clock overhead, which probably includes triggering a capture.

extern void inline
__attribute__((__gnu_inline__,__always_inline__))
nrf_delay_us(uint32_t volatile number_of_us)
{
    __ASM volatile (
        "1:\tSUB %0, %0, #1\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "NOP\n\t"
        "BNE 1b\n\t"
        : "+r"(number_of_us)
    );
}
Parents
  • Yeah, I had the same problem, this is what I use:

    /* The above version's run time depends on 
     * the mood of the compiler. 
     */
    __ASM volatile(
        ".syntax unified\n"
        "1:\n"
        " SUBS %[delay], %[delay], #1\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"   
        " NOP\n"  
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " NOP\n"
        " BNE 1b\n" // 3 if taken, 1 if not  
        ".syntax divided\n"  
        : [delay] "+r" (number_of_us));
    
  • Off-topic, but if you bypass the SDK API and manipulate things directly it'd be much faster. The sequence:

    NRF_GPIO->OUTSET = (1U << PIN);
    NRF_GPIO->OUTCLR = (1U << PIN);
    

    takes 7 clock cycles by my measurement (eliminating instrumentation overhead), or less than 0.5 us. If you're really simulating a communication protocol it might be better to use a PPI driven by timer CC events to get precise signal widths.

Reply
  • Off-topic, but if you bypass the SDK API and manipulate things directly it'd be much faster. The sequence:

    NRF_GPIO->OUTSET = (1U << PIN);
    NRF_GPIO->OUTCLR = (1U << PIN);
    

    takes 7 clock cycles by my measurement (eliminating instrumentation overhead), or less than 0.5 us. If you're really simulating a communication protocol it might be better to use a PPI driven by timer CC events to get precise signal widths.

Children
No Data
Related