0

Suppose I have a class Matrix5x5 (with suitably overloaded index operators) and I write a method trace for calculating the sum of its diagonal elements:

double Matrix5x5::trace(void){
    double t(0.0);
    for(int i(0); i <= 4; ++i){
        t += (*this)[i][i];
    }
    return t;
}

Of course, if I instead wrote:

return (*this)[0][0]+(*this)[1][1]+(*this)[2][2]+(*this)[3][3]+(*this)[4][4];

then I would be sure to avoid the overhead of declaring and incrementing my i variable. But it feels quite stupid to write out all those terms!

Since my loop has a constexpr number of terms that happens to be quite small, would a compiler inline it for me?

melpomene
  • 84,125
  • 8
  • 85
  • 148
user1892304
  • 617
  • 1
  • 6
  • 11
  • 2
    Turn up optimization level and look at the assembly code. – R Sahu Mar 30 '19 at 22:32
  • 2
    Maybe, maybe not. The only way to know for sure is to look at what code your compiler generates. There is no rule that requires every C++ compiler in existence, anywhere in the world, to inline or not inline this loop. It all depends on the compiler, and this does not fall into one of those cases where one can reasonably expect one particular result from a typical compiler. – Sam Varshavchik Mar 30 '19 at 22:34
  • 1
    Do you mean 'inline' or 'unroll'? – user207421 Mar 30 '19 at 22:44

2 Answers2

1

If your compiler is clever enough, it can optimize this case with the as-if rule. The C++ compiler might optimize a lot of things that way. But it also might not. The only way to be absolutely sure is to check the code your specific compiler generates. Having said that, it's unlikely this will be a bottleneck in your program. So do whichever version is more readable.

Aykhan Hagverdili
  • 28,141
  • 6
  • 41
  • 93
1

Yes! GCC does it at optimization level -O1 and above, and clang does it at optimization level -O2 and above.

I tested it using this code:

struct Matrix5x5 {
    double values[5][5];
    Matrix5x5() : values() {}

    double trace() {
        double sum = 0.0;
        for(int i = 0; i < 5; i++) {
            sum += values[i][i]; 
        }
        return sum; 
    }
};

double trace_of(Matrix5x5& m) {
    return m.trace(); 
}

And this is the assembly produced by both gcc and clang:

trace_of(Matrix5x5&):
    pxor    xmm0, xmm0
    addsd   xmm0, QWORD PTR [rdi]
    addsd   xmm0, QWORD PTR [rdi+48]
    addsd   xmm0, QWORD PTR [rdi+96]
    addsd   xmm0, QWORD PTR [rdi+144]
    addsd   xmm0, QWORD PTR [rdi+192]
    ret

You can play around with the code, and look at the corresponding assembly here: https://godbolt.org/z/p2uF0E.

If you overload operator[], then you have to up the optimization level to -O3, but the compiler will still do it: https://godbolt.org/z/JInIME

Alecto Irene Perez
  • 10,321
  • 23
  • 46