NerdKits - Expanding Programming Languages (Support Forum)

You are not logged in. [log in]

NEW: Learning electronics? Ask your questions on the new Electronics Questions & Answers site hosted by CircuitLab.

Support Forum » Expanding Programming Languages

NerdKits » Forums » Support Forum » Expanding Programming Languages (3 posts)

November 07, 2010 by DrNoExpert	I have gone through and learned c, yet now for business purposes, I need to expand my programming languages. For the most part, all the languages are shown in the AVR Studio. My question is, is there any changes I will need to make to the original hardware and software configurations to fit these, mainly assembler, and still program on AVR Studio via the nerdkits USB-Serial cable?
November 08, 2010 by mrobbins (NerdKits Staff)	Hi DrNoExpert, Just to warn you, the fact that a language shows up in Programmer's Notepad doesn't necessarily mean that it's going to work on the AVR. I'm not exactly sure what you're referring to in AVR Studio because I don't use that tool. But yes, you can use assembly on the microcontroller -- take a look at the Inline Assembler Cookbook and the AVR instruction set. Also check out this post which walks through some assembly code. No hardware or software changes will be needed to use inline assembly. (If you want to use purely assembly, you'll have to change the Makefile a lot to get things to compile again. However, I'd recommend inline assembly because it lets you mix high-performance code when you need it with easier C code when you don't.) Here's an example of using inline C. Let's say we have two uint16_t's that I'd like to add: call these varA and varB. Each could have a value 0 through 65535. That means the sum will have a value 0 through 131070. This last number, the maximum possible sum, is too big to store in a uint16_t, so we'll bump up to a uint32_t, even though the top 15 bits will certainly all be zeros. Straight up C: `uint32_t my_16plus16_v1(uint16_t varA, uint16_t varB) { uint32_t result; result = varA + varB; return result; }` and our test code: `uint16_t x = 65530; uint16_t y = 10000; printf_P(PSTR("x: %u\r\n"), x); printf_P(PSTR("y: %u\r\n"), y); printf_P(PSTR("v1: %ld\r\n"), my_16plus16_v1(x,y));` What do we get? We expect to see 65530+10000=75530, but instead this code actually prints 9994! Even though our result was a uint32_t, it does the addition as 16-bit, and just ignores the top bit. That's no good! If we look at the assembly generated by the compiler, it clearly only does two 8-bit additions, ignoring the carry result from the higher byte addition. It also takes a total of 6 clock cycles for the function body (not inclding the final RET instruction to return to the calling function). Let's fix it in C code: `uint32_t my_16plus16_v2(uint16_t varA, uint16_t varB) { uint32_t result; result = (uint32_t) varA + (uint32_t) varB; return result; }` Now, by casting both parameters of the addition to uint32_t's before adding, we get the right result, 75530, but the compiler takes 12 clock cycles to do the function body. (It actually goes through the work of adding two 32-bit things, even though there are a bunch of zeros. Finally, let's write it in inline assembly: `uint32_t my_16plus16_v3(uint16_t varA, uint16_t varB) { uint32_t result; asm volatile ( // copy varA to output register "movw %A0, %A1" "\n\t" // add lower bytes "add %A0, %A2" "\n\t" // add upper bytes with carry "adc %B0, %B2" "\n\t" // clear 3rd byte of result "clr %C0" "\n\t" // add carry bit of the upper bytes "adc %C0, r1" "\n\t" // (r1 is the zero register) // clear 4th byte of result "clr %D0" "\n\t" :"=r" (result) :"r" (varA), "r" (varB) ); return result; }` In this version, even though we've only written 6 instructions, the compiler adds some final move instructions to get things to the right places, but we still end up with a total of 8 clock cycles for the function body. I'm no inline assembly guru, but I think with some tweaking it should be possible to get the compiler to stop moving things around and really get it down to 6 clock cycles. At either 8 or 6, we're well ahead of the compiler's 12. Hopefully that's a useful introduction to getting started with inline assembly! In real life, I probably wouldn't use it for something as simple as adding two uint16_t's. However, there are times when we've needed to go to inline assembly for some projects. It is sometimes possible to beat the compiler by significant margins (making code that runs in 30-50% of the compiler's clock cycles), which is great for power consumption, but is especially great for code that needs to run quickly and repeatedly, such as driven by a timer or ADC interrupt. If your C code can't run fast enough to be finished in time for the next interrupt, then look at assembly as a possible way to squeeze a little bit more performance out of the chip! Mike
November 08, 2010 by DrNoExpert	Thanks Mike. That was what I needed to start getting into assembler, and I didn't know I could mix assembler and c. That's something else I could look further into.

November 07, 2010
by DrNoExpert
DrNoExpert's Avatar

I have gone through and learned c, yet now for business purposes, I need to expand my programming languages. For the most part, all the languages are shown in the AVR Studio. My question is, is there any changes I will need to make to the original hardware and software configurations to fit these, mainly assembler, and still program on AVR Studio via the nerdkits USB-Serial cable?

November 08, 2010
by mrobbins
(NerdKits Staff)

Hi DrNoExpert,

Just to warn you, the fact that a language shows up in Programmer's Notepad doesn't necessarily mean that it's going to work on the AVR. I'm not exactly sure what you're referring to in AVR Studio because I don't use that tool. But yes, you can use assembly on the microcontroller -- take a look at the Inline Assembler Cookbook and the AVR instruction set. Also check out this post which walks through some assembly code.

No hardware or software changes will be needed to use inline assembly. (If you want to use purely assembly, you'll have to change the Makefile a lot to get things to compile again. However, I'd recommend inline assembly because it lets you mix high-performance code when you need it with easier C code when you don't.)

Here's an example of using inline C. Let's say we have two uint16_t's that I'd like to add: call these varA and varB. Each could have a value 0 through 65535. That means the sum will have a value 0 through 131070. This last number, the maximum possible sum, is too big to store in a uint16_t, so we'll bump up to a uint32_t, even though the top 15 bits will certainly all be zeros.

Straight up C:

uint32_t my_16plus16_v1(uint16_t varA, uint16_t varB) {
  uint32_t result;

  result = varA + varB;

  return result;
}

and our test code:

uint16_t x = 65530;
uint16_t y = 10000;
printf_P(PSTR("x: %u\r\n"), x);
printf_P(PSTR("y: %u\r\n"), y);
printf_P(PSTR("v1: %ld\r\n"), my_16plus16_v1(x,y));

What do we get? We expect to see 65530+10000=75530, but instead this code actually prints 9994! Even though our result was a uint32_t, it does the addition as 16-bit, and just ignores the top bit. That's no good! If we look at the assembly generated by the compiler, it clearly only does two 8-bit additions, ignoring the carry result from the higher byte addition. It also takes a total of 6 clock cycles for the function body (not inclding the final RET instruction to return to the calling function).

Let's fix it in C code:

uint32_t my_16plus16_v2(uint16_t varA, uint16_t varB) {
  uint32_t result;

  result = (uint32_t) varA + (uint32_t) varB;

  return result;
}

Now, by casting both parameters of the addition to uint32_t's before adding, we get the right result, 75530, but the compiler takes 12 clock cycles to do the function body. (It actually goes through the work of adding two 32-bit things, even though there are a bunch of zeros.

Finally, let's write it in inline assembly:

uint32_t my_16plus16_v3(uint16_t varA, uint16_t varB) {
  uint32_t result;

  asm volatile (
    // copy varA to output register
    "movw %A0, %A1" "\n\t"
    // add lower bytes
    "add %A0, %A2" "\n\t"
    // add upper bytes with carry
    "adc %B0, %B2" "\n\t"
    // clear 3rd byte of result
    "clr %C0" "\n\t"
    // add carry bit of the upper bytes
    "adc %C0, r1" "\n\t" // (r1 is the zero register)
    // clear 4th byte of result
    "clr %D0" "\n\t"
    :"=r" (result)  
    :"r" (varA), "r" (varB)
  );

  return result;
}

In this version, even though we've only written 6 instructions, the compiler adds some final move instructions to get things to the right places, but we still end up with a total of 8 clock cycles for the function body. I'm no inline assembly guru, but I think with some tweaking it should be possible to get the compiler to stop moving things around and really get it down to 6 clock cycles. At either 8 or 6, we're well ahead of the compiler's 12.

Hopefully that's a useful introduction to getting started with inline assembly! In real life, I probably wouldn't use it for something as simple as adding two uint16_t's. However, there are times when we've needed to go to inline assembly for some projects. It is sometimes possible to beat the compiler by significant margins (making code that runs in 30-50% of the compiler's clock cycles), which is great for power consumption, but is especially great for code that needs to run quickly and repeatedly, such as driven by a timer or ADC interrupt. If your C code can't run fast enough to be finished in time for the next interrupt, then look at assembly as a possible way to squeeze a little bit more performance out of the chip!

Mike

November 08, 2010
by DrNoExpert
DrNoExpert's Avatar

Thanks Mike. That was what I needed to start getting into assembler, and I didn't know I could mix assembler and c. That's something else I could look further into.

Post a Reply

Please log in to post a reply.