NerdKits - electronics education for a digital generation

You are not logged in. [log in]

NEW: Learning electronics? Ask your questions on the new Electronics Questions & Answers site hosted by CircuitLab.

Microcontroller Programming » Why custom delay functions?

January 05, 2011
by Hexorg
Hexorg's Avatar

I wanted to ask nerdKit's staff why did they decide to use custom delay functions instead of the _delay_ms(), _delay_us(), _delay_loop_1() and _delay_loop_2() found in util/delay.h and util/delay_basic.h?

January 05, 2011
by mrobbins
(NerdKits Staff)

mrobbins's Avatar

All four of those require that the delay length is known at compile time. That's no good if you want to be able to adjust the delay period while running from some variable. The _delay_ms and _delay_us functions allow floating point input. That's fine because it's optimized away by the compiler, but in general is something we try to avoid with embedded systems since floating point instructions are so expensive in general at runtime.

In contrast, our delay_ms and delay_us functions work with variable delays -- for example, this is used in our early Making Music with a Microcontroller video, where we used different delay lengths to make different frequency square waves. (Of course, Timer/Counters could have been used here, but we tried to keep that particular video tutorial as simple as possible.)

In practice, it doesn't matter too much -- and you should either be using assembly or timer/counters if you want very accurate timing in any case.

Mike

January 05, 2011
by Hexorg
Hexorg's Avatar

ah I see, thank you :)

January 05, 2011
by bretm
bretm's Avatar

The util/delay.h functions have another problem. If I recall, they do something like (F_CPU/4000) internally at some point. If F_CPU is defined as a floating-point numeric literal such as 14745600.0 or 14.7456e6, the math will come out right. But if F_CPU is defined as 14745600 it does integer division and loses precision. It's not a problem for clocks like 16MHz or 20MHz, but it is for the Nerdkits clock. Either way I think it's more accurate that the Nerdkits routines.

The Nerdkits clock ticks 9216 times every 625 microseconds. So theoretically the timing loop would should something like this, good for up to 466 milliseconds:

void delay_us(uint32_t us)
{
    const uint32_t overhead = 50; // whatever the actual overhead is for call/ret/math
    const uint32_t ticksPerLoop = 10; // assuming 4xSBC, 4xCP, 1xBR
    uint32_t register counter = us * 9216 - overhead * ticksPerLoop;

    while (counter > 0)
    {
        counter -= ticksPerLoop * 625;
    }
}

It would be interesting to figure that out for sure.

January 09, 2011
by bretm
bretm's Avatar

I investigated this further and the result is unexpected. The delay loop in the Nerdkit's delay_us function looks like this:

  uint16_t i;
  for(i=0; i<us; i++) {
    NOP;    // two is right for 8MHz, tested by mikey and robtruax
    NOP;

    NOP;    // so adding 7 more will yield a slightly slow clock (1.017us)
    NOP;
    NOP;
    NOP;
    NOP;
    NOP;
    NOP;
  }

I found that when a variable is passed in to the function, the loop does indeed take 15 clock cycles, or 1.017us per cycle. But the crazy thing is if you pass in a constant, which may be the more common case, the compiler "optimizes" it into a loop that takes 16 cycles, or 1.085us per cycle. So when you do delay_us(5000) it actually delays about 5425us. It seems like the only way to get around this would be to write the loop completely in assembly language.

The looping code for the 16-cycle case for delay_us(5000) looks like

9 NOPs                9
ADIW      R24,0x01    2   Increment "i"
LDI       R18,0x13    1   
CPI       R24,0x88    1   Compare with 0x1388 (5000)
CPC       R25,R18     1   
BRNE      PC-0x0D     2   Branch if not equal

The looping code for the 15-cycle case looks like

9 NOPs               9
SUBI      R18,0xFF   1    Increment by subtracting negative 1
SBCI      R19,0xFF   1    
CP        R18,R22    1    Compare with variable
CPC       R19,R23    1    
BRCS      PC-0x0D    2    Branch if carry set
January 09, 2011
by mrobbins
(NerdKits Staff)

mrobbins's Avatar

Hi bretm,

Compilers can do strange and stupid things. Isn't it funny that an "optimization" for a constant parameter actually makes the function take more cycles? In any case, I'd think this would only happen when the code was inlined or if you copied the delay_us code into another file you were playing with. If the delay_us code is compiled first, in a separate file and made into its own .o file, then at that point the compiler can't make the assumption about the constant parameter, so I think its machine code representation is fixed. But with "inline", the compiler can try to re-compile that function where it's called. You're correct that defining the functions in assembly would help.

There are various issues like the function call overhead as well. Plus, once you are running interrupts or a scheduler, these delay functions don't take those "missing cycles" into account, so in almost any real project where you'll have one or more interrupt handlers, you wouldn't want to use NOP-based delay functions if precise timing was important.

Mike

May 23, 2011
by csseal
csseal's Avatar

Hello, Can someone please confirm I've fully understood this subject. The code for delay.c states that for a correct (ish) 1us delay we need

2 NOPs when using an 8MHz Frequency (8 cycles to reach 1us)

9 NOPs for a 14.7MHz Frequency (15 cycles to reach 1us)

Therefore there are overheads in the code that equate to 6 cycles making delay.c unusable when using the 1MHz internal frequency?

If I am on the right track how are the overhead times determined?

Thanks

May 23, 2011
by bretm
bretm's Avatar

libnerdkits/delay is hard-coded for 14.7456MHz, so you'd need to use util/delay for any other frequency.

Post a Reply

Please log in to post a reply.

Did you know that you can control 120 LEDs with just 17 microcontroller pins? Learn more...