0

So, I'm making a Hack CPU emulator, and I was wondering what the best way to calculate the output was. Would condensing the output calculations into one unreadable line be more efficient than calculating the result one step at a time? Does the compiler optimize it such that both options are fine? Basically, which of these is more efficient --

this:

    word HackALU(word x, word y, bool zx, bool nx, bool zy, bool ny, bool f, bool no)
    {
        x = zx ? 0 : x;
        y = zy ? 0 : y;

        x = nx ? ~x : x;
        y = ny ? ~y : y;

        word result = f ? x + y : x & y;

        return no ? ~result : result;    
    }

or this:

    word HackALU(word x, word y, bool zx, bool nx, bool zy, bool ny, bool f, bool no)
    {
        return no ? ~(f ? ((nx ? ~(zx ? 0 : x) : (zx ? 0 : x)) + (ny ? ~(zy ? 0 : y) : (zy ? 0 : y))) : ((nx ? ~(zx ? 0 : x) : (zx ? 0 : x)) & (ny ? ~(zy ? 0 : y) : (zy ? 0 : y)))) : (f ? ((nx ? ~(zx ? 0 : x) : (zx ? 0 : x)) + (ny ? ~(zy ? 0 : y) : (zy ? 0 : y))) : ((nx ? ~(zx ? 0 : x) : (zx ? 0 : x)) & (ny ? ~(zy ? 0 : y) : (zy ? 0 : y))));
    }
Publius
  • 1,184
  • 2
  • 10
  • 27
  • Those don't appear to be the same code. For instance, nx is not used in the upper example at all, but appears to affect the outcome in the lower block of code. –  Mar 20 '12 at 19:51
  • I made a typo. It's fixed now, so they should generate identical results. – Publius Mar 21 '12 at 03:16

3 Answers3

1

A good modern compiler will most likely generate identical code for both.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • That's good to know. Because the first actually uses more memory in that it saves the word 'result' to memory. The compiler would eliminate that step? – Publius Mar 21 '12 at 03:16
  • 1
    If I compile `HackALU()` with gcc 3.3.4 on x86 with optimization level 1, 2 or 3, there's no memory variable reserved for `result` on the stack. Everything's done in registers. See for yourself: `gcc -c -O -S -o `. – Alexey Frunze Mar 21 '12 at 06:55
  • Awesome, that answers my question. – Publius Mar 21 '12 at 16:34
1

Logic changes will have larger effects on the performance of code rather than whitespace / storage of temporaries will.

For example, some machines don't have branch prediction (PS3 SPUs for example), in which case your code will be definitively faster by replacing the branches with mathematical operations

word HackALU(word x, word y, bool zx, bool nx, bool zy, bool ny, bool f, bool no)
{
    x = (zx == 0) * x; // [0 or 1] * x;
    y = (zy == 0) * y;

    x -= (nx != 0) * 2 * x;
    y -= (ny != 0) * 2 * x;

    word result = (f != 0) * (x + y) + (f == 0) * (x & y);

    return (no != 0) * ~result + (no == 0) * result;    
}
Roger Hanna
  • 176
  • 8
0

Using this loop, I actually show the top version to be faster:

int n = 0; //optimization busting counter
clock_t start = clock();
    for( word x=0; x<1000; ++x ) {
    for( word y=0; y<1000; ++y ) {
        for( int b = 0; b < 64; ++b ) {
            n += HackALU(x,y,b&0x1,b&0x2,b&0x4,b&0x8,b&0x10,b&0x20);
}   }   }
clock_t end = clock();
printf("finished, elapsed ticks = %d, n = %d\n", end - start, n);

It's pretty obvious the top version would be less instructions unless the optimizer is really good... I think making it faster would require reducing branches or making sure they are accurately predicted.