I'm trying to profile the expression template similar to the one on the book "C++ Template" by David Vandevoorde. Below is my theoretical analysis, which is probably wrong because the test shows unexpected results. Suppose the test is about:
R = A + B + C;
where A, B, C, R are arrays allocated on the heap. The size of the array is 2. So the following will be executed:
R[0] = A[0] + B[0] + C[0]; // 3 loads + 2 additions + 1 store
R[1] = A[1] + B[1] + C[1];
with approximately 12 instructions (6 for each).
Now, if expression template is enabled (shown at the very bottom), after type deduction is done at compiler time, the following will be processed at run time before the identical evaluation is performed as the above one:
A + B --> expression 1 // copy references to A & B
expression 1 + C --> expression 2 // copy the copies of references to A & B
// + copy reference to C
Therefore, there's totaly 2+3=5 instructions before the evaluation, which is about 5/(5+12)=30% of the total instructions. So I should be able to see this overhead especially when the vector size is small.
But the result shows that the cost for the two are nearly the same. I iterate the test for 1E+09 times. The assembly codes for the two are the same, of course. But I couldn't find the part for this "construction" part that costs any time or instructions.
movsdq (%r9,%rax,8), %xmm0
addsdq (%r8,%rax,8), %xmm0
addsdq (%rdi,%rax,8), %xmm0
movsdq %xmm0, (%rcx,%rax,8)
I don't have a good CS background so this question may be so stupid. But I've been scratching my head for days on this. So any help is appreaciated!
--- My expression template ---
template< typename Left, typename Right >
class V_p_W // stands for V+W
{
public:
typedef typename array_type::value_type value_type;
typedef double S_type;
typedef typename traits< Left >::type V_type;
typedef typename traits< Right >::type W_type;
V_p_W ( const Left& _v, const Right& _w ) : V(_v), W(_w)
{}
inline value_type operator [] ( std::size_t i ) { return V[i] + W[i]; }
inline value_type operator [] ( std::size_t i ) const { return V[i] + W[i]; }
inline std::size_t size () const { return V.size(); }
private:
V_type V;
W_type W;
};
where traits does nothing but to decide if the value of the reference of the object should be taken. For example, the value is copied for an integer but the reference is taken for an array.