This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

[Basic Question] The speed of nRF51822 when executing instructions

Hi, recently I'm looking at some computer architecture books.

So my question could be easy or silly.

I tried to search more but I couldn't find many.

The nRF51822 is based on Cortex-M0. So it has 56 instructions.

**/**************************************/

  1. Does every instructions (of this MCU, let's only limit it to Rev. 3) take the same time to complete it?

For example,

??main_1:
  ADDS R0, R0, #1
 
??main_2:
  CMP R0, #200
  BLT.N ??main_1

suppose the time it takes to finish the ADDS, CMP, or BLT.N are t0, t1, t2.

Is t0 == t1 == t2? Or are they all different?

  1. If every instructions take the same amount of time, does that value equals to 1/14 usec?

I thought of this since nrf_delay_us function uses 14 instructions to delay 1 usec.

If not, what calculation made the developers to use 14 instructions to delay 1 usec?

  1. Continuing with the value of t0, t1, t2, how much does it take?

Is it just 1 / clock frequency?

Since Cortex M0 uses 3-stage pipeline, do I have to consider this as well?

  1. When using simple_uart_putstring() (located at simple_uart_putstring.c SDK 7.2),

I wanted to know about the time gap after sending characters.

For instance,

#define MSG (const uint8_t *) "Hello\n"

int main(void){
//omit other parts...
   uart_init(); // suppose the UART pin is initialized correctly without using HWID
                // assume the baud rate is 115200, no parity, 8 bit data
 
   while(true) simple_uart_putstring(MSG);
}

After sending 'H', could there be a small gap before sending 'e'?

Since simple_uart_putstring uses a while loop and increments,

I was wondering how much time will it take.

Added : Timing diagram of L3GD20.

SPI

-Regards, Mango922

    1. no - many instructions take 1 cycle, some take more, multiply takes quite a few and the load store multiple takes time depending on how many registers you're stacking. That's all in the ARM 6M reference manual.

    2. One cycle is 1/16 microsecond and many instructions take one cycle. They used 14 because there's overhead in the rest of the loop to make an accurate delay (or not so accurate if you read other posts)

    3. no you don't need to consider the pipeline. With the nRF52 you do have to consider the wait states on the flash and the pipeline comes into play there as it may stall, also the write buffer introduces cycles of delay here and there, the nrf51 doesn't have those issues.

    4. I have no idea and don't really understand the question and UART rates are so much slower than the clock frequency of the chip I can't see it would matter.

    1. Exact timings of instructions execution are given in the Cortex-M0 Technical Reference Manual. Taken branches execute in three cycles. Loads and stores add one cycle to the number of registers loaded/stored, pops loading pc take 3 additional cycles. For loads and stores you also have to add number of wait states for peripheral accesses. In my belief it's 2 wait states for all Nordic peripherals except RTC, for that it's 3 wait states. All other instructions including mul (nRF51 uses fast multiplier option) take 1 cycle. You can easily measure execution time of a sequence of instructions using a timer. Start a timer with zero prescaler and trigger capture tasks just before and after the instruction sequence, then subtract corresponding captured values and subtract 4 for one of the TASKS_CAPTURE accesses.

    2. nRF51 runs at a fixed frequency of 16MHz, so instruction execution time is N/16 uS, where N is the duration of execution stage of that instruction.

    3. nRF51 has no wait states for RAM (except for conflicting accesses) and FLASH (except during program or erase) accesses, pipeline stages perfectly overlap and so you don't have to bother with the pipeline.

  • I didn't thought about the ARM 6M reference manual. Thanks for letting me know.

    About Q4, it sure is slow when comparing with the baud rate (115200) and time of one cycle.

    However, let me change the question a little bit.

    Suppose I attached a 3-axis digital gyroscope (L3GD20 for example) to the PCA10001.

    Using SPI, with the clock speed of 10 MHz, reading a value will take 1.6 * 10^-8 sec.

    (Since it uses 16 cycles.) After reading it, it need extra calculation to use it.

    However, since the nRF51822 does not have FPU (Floating Point Unit),

    I expected that the calculating time will be longer than 1.6 * 10^-8 sec.

    As you said, multiply will take much longer and if the program uses floating points, I thought it will take more.

    So, in this case, won't there be a slight delay to earn the value?

Related