0

On my gcc-4.8.1, I've compiled the following program with two commands:

g++ -Wfatal-errors -std=c++11 -Wall -Werror test.cpp -o test -g
g++ -Wfatal-errors -std=c++11 -Wall -Werror test.cpp -o test -O3 -g

The first executable has the expected output, but the second one segfaults. The problem is that it's hard to debug because -O3 messes with the code too much for the -g debug information to retain meaning, so gdb has trouble translating what's going on in source code. So, I started inserting print statements instead. As I expected, print statements change the result. With debug prints, it works just fine!

Here is my expression template source:

//test.cpp
#include <vector>
#include <stdlib.h>
#include <iostream>

using namespace std;

typedef vector<int> Valarray;

template<typename L, typename R>
struct BinOpPlus {
  const L& left;
  const R& right;

  BinOpPlus(const L& l, const R& r)
    : left(l), right(r)
  {}

  int operator[](int i) const { 
    int l = left[i];
    //cerr << "Left: " << l << endl; //uncomment to fix segfault
    int r = right[i];
    //cerr << "Right: " << r << endl; //uncomment to fix segfault
    return l + r;
  }
};

template<typename L, typename R>
BinOpPlus<L, R> operator+(const L& left, const R& right){
  return BinOpPlus<L, R>(left, right);
}

int main() {
  //int size = 10000000;
  int size = 10;
  Valarray v[3];
  for(int n=0; n<3; ++n){
    for(int i=0; i<size; ++i){
      int val = rand() % 100;
      v[n].push_back(val);
    }
  }

  auto out = v[0] + v[1] + v[2];

  int sum = 0;
  for(int i=0; i<size; ++i){
    cerr << "Checkpoint!" << endl;
    sum += out[i]; //segfaults here
    cerr << "Sum: " << sum << endl;
  }

  cout << "Sum: " << sum << endl;
  return 0;
}

It's been a long time since -O3 has given me an incorrect/unreliable binary. I am first assuming that I did something wrong in my code, but not wrong enough for -O0 to show it. Anyone have any ideas what I'm doing wrong?

Suedocode
  • 2,504
  • 3
  • 23
  • 41
  • Why oh why would anyone want to do this? – Andrey Mishchenko Feb 04 '15 at 00:45
  • It's interesting that you call the segmentation fault that you get as a result of running this code "befuddling." What I would be include to call "befuddling" are (1) the motivations behind this kind of code, and (2) what type you think the `auto` in `main` actually has (I don't actually know the answer to this). – Andrey Mishchenko Feb 04 '15 at 00:47
  • @Andrey I don't see the problem. The general idea of ETs arithmetic seems legitimate to me, it is just an implementation problem. – sbabbi Feb 04 '15 at 09:01
  • @Andrew (1) Vector arithmetic without copying vectors into temporary vectors and such. The sum should require no vector allocations with the expression template, whereas the more intuitive way creates one or two extra vector temporaries, and 1 more extra local vector for the accumulate. Without memory allocations, the accumulate is much faster. (2) sbabbi is correct in that it is a `BinOpPlus< BinOpPlus, Valarray>`. The expression template type really shouldn't matter though. – Suedocode Feb 04 '15 at 15:30

3 Answers3

1

In this line

auto out = v[0] + v[1] + v[2];

The type of out is BinOpPlus< BinOpPlus<ValArray, ValArray>, Valarray>. Since your BinOpPlus stores references to its arguments, and the BinOpPlus<ValArray,ValArray> there is a temporary, you have undefined behavior.

Usually expression templates like these use a trait to decide how to store their arguments, so that you can store actual objects by reference (and assume that the user will not mess up) and other ETs by value (they are very small anyway).

Also using auto with arithmetic ETs is considered at least problematic, because it rarely produce the intended type. For this very reason there have been a couple of proposal to introduce a sort of operator auto to customize the type deduced by auto in ETs.

sbabbi
  • 11,070
  • 2
  • 29
  • 57
0

This is a hunch.

In the line

auto out = v[0] + v[1] + v[2];

you have temporary object. That might be getting deleted with the -O3 flag. I would try the following:

auto out1 = v[1] + v[2];
auto out = v[0] + out1;
R Sahu
  • 204,454
  • 14
  • 159
  • 270
0

Change your code to not use reference members in your struct. I believe that the reference members makes the copy operation go haywire when you do the addition here:

auto out = v[0] + v[1] + v[2];

For example:

template<typename L, typename R>
struct BinOpPlus {
  const L left;
  const R right;

Making this change works correctly.

Also, FYI, when compiling your code with Visual Studio 2013 with full warnings (/W4), we get this warning:

warning C4512: 'BinOpPlus' : assignment operator could not be generated.

So right there, this indicates that any copying may produce undesired effects.


Live example of good run without references: http://ideone.com/JKxoDv

Live example of bad run with references: http://ideone.com/7oSoJB

PaulMcKenzie
  • 34,698
  • 4
  • 24
  • 45
  • Visual Studio 2013 is bad with template stuff in my experience. However, The whole purpose of this exercise is to not copy the vectors. However, it is necessary to copy rvalue references, as sbabbi and R Sahu pointed out. I had overlooked this detail. – Suedocode Feb 04 '15 at 15:26