C++ Vector Template Per-Component Operations

Question

I'm rewriting the vector math portion of my project, and I'd like to generalize vectors by their type and number of dimensions. A vector<T, N> represents an N dimensional vector of type T.

template<typename T, int N>
struct vector {
    T data[N];
};

I'll need to rewrite many math functions, most of which will operate on a per-component basis. The straightforward implementation of the addition operator is shown below.

template<typename T, int N>
vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
    vector<T, N> result;
    for (int i = 0; i < N; i++) {
        result[i] = lhs[i] + rhs[i];
    }
    return result;
}

My question: Is there a way (via template-trickery?) to implement this without the use of a for loop and a temporary variable? I understand that the compiler would most likely unroll the loop and optimize it away. I just don't like the idea of having all of my performance-critical math functions implemented this way. They will all be inlined and in the header, so having many of these functions would also make for a big ugly header file.

I'm wondering if there is a way to do this which would produce more optimal source code. Possibly a way that works like variadic templates do. Something along the lines of this.

template<typename T, int N>
vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
    return vector<T, N>(lhs[0] + rhs[0], lhs[1] + rhs[1]...);
}

Michael Anderson · Answer 1 · 2016-03-16T04:53:35.133

One way to do this is via lower level "map" functions:

Here's a complete working example

#include <iostream>
#include <math.h>

template<typename T, int N>
struct vector {
    T data[N];
};

First declare your worker "map" functions - I've got 3 here map, map2, foreach.

template<typename T, int N, typename FN>
static void foreach(const vector<T,N> & vec, FN f) {
   for(int i=0; i<N ;++i) {
      f(vec.data[i]);
   }
}

template<typename T, int N, typename FN>
static auto map(const vector<T,N> & vec, FN f) -> vector<decltype(f(T(0))), N> {
   vector<decltype(f(T(0))), N> result;
   for(int i=0; i<N ;++i) {
      result.data[i] = f(vec.data[i]);
   }
   return result;
}

template<typename T1, typename T2, int N, typename FN>
static auto map2(const vector<T1,N> & vecA, 
                 const vector<T2,N> & vecB, 
                 FN f)
 -> vector<decltype(f(T1(0), T2(0))), N> {
   vector<decltype(f(T1(0), T2(0))), N> result;
   for(int i=0; i<N ;++i) {
      result.data[i] = f(vecA.data[i], vecB.data[i]);
   }
   return result;
}

Now use the helpers to define your higher level functions via lambdas. I'll define binary +, binary -, unary - and e^x. Oh and operator<< so we can see what is going on.

I'm pretty sure there's a better alternative to the lambdas used in operator+ and operator-, but I can't remember them

template<typename T, int N>
vector<T,N> operator+(const vector<T,N> &lhs, const vector<T,N> &rhs) {
  return map2(lhs, rhs, [](T a,T b) { return a+b;} );
}

template<typename T, int N>
vector<T,N> operator-(const vector<T,N> &lhs, const vector<T,N> &rhs) {
  return map2(lhs, rhs, [](T a,T b) { return a-b;} );
}

template<typename T, int N>
vector<T,N> operator-(const vector<T,N> &vec) {
  return map(vec, [](T a) { return -a;} );
}

template<typename T, int N>
auto exp(const vector<T,N> &vec) -> vector<decltype(exp(T(0))), N> {
  return map(vec, [](T a) { return exp(a); } );
}

template<typename T, int N>
std::ostream & operator<<(std::ostream& os, const vector<T,N> &vec) {
  os<<"{";
  foreach(vec, [&os](T v) { os<<v<<", "; } );
  os<<"}";
  return os;
}

Now look how they work just fine...

int main() {
  vector<int, 5> v1 = {1,2,3,4,5};
  vector<int, 5> v2 = {2,4,6,8,10};

  std::cout<<v1 << " + " << v2 << " = " << v1+v2<<std::endl;
  std::cout<<v1 << " - " << v2 << " = " << v1-v2<<std::endl;
  std::cout<<" exp( - " << v2 << " )= " << exp(-v1)<<std::endl;
}

The lambda map functions are an interesting idea. They would definitely shorten the length of the code, but still rely on for loops. — Janz, Mar 17 '16 at 02:14
The limits on the for loops are compile time constants - the compiler will unroll them. However _if_ you find the functions are bottlenecks, then you'll probably want to go with a more robust and tested numerical vector class - or drop down to specialising the botlenecking cases by hand. — Michael Anderson, Mar 17 '16 at 02:21

score 0 · Answer 2 · answered Mar 16 '16 at 05:44

You can do this and I'll point you towards a solution (which compiles and runs). You are looking to get rid of the loop, preferably by inlining it in hopes the compiler will optimize things for you.

In practice I have found it sufficient to specify the dimensions needed, i.e. N = 3, 4, 5 because this allows finer grained control over what the compiler does than doing what you asked for. However you can use recursion and partial template specialization to implement your operators. I have illustrated addition.

So instead of this:

template<typename T, int N>
vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
    vector<T, N> result;
    for (int i = 0; i < N; i++) {
        result[i] = lhs[i] + rhs[i];
    }
    return result;
}

You want code that effectively does this:

   template<typename T, int N>
    vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
        vector<T, N> result;
        result[0] = lhs[0] + rhs[0];
        result[1] = lhs[1] + rhs[1];
        ...
        result[N-1] = lhs[N-1] + rhs[N-1];
        return result;
    }

if N is 1, it is pretty easy you just want this... template vector operator+(vector lhs, vector rhs) { vector result; result[0] = lhs[0] + rhs[0]; return result; }

and if N is 2, it is pretty easy you just want this... template vector operator+(vector lhs, vector rhs) { vector result; result[0] = lhs[0] + rhs[0]; result[1] = lhs[1] + rhs[1]; return result; }

The easiest way is to simply define this up to as many N as you expect to use and not the answer you are looking for because you probably don't need more than N=5 or N=6 in practice right?

However, you can also use partial template specialization and recursion to get there. Consider this struct, which recursively calls itself and then assigns the index:

template<typename T, int N, int IDX>
struct Plus
{
    void operator()(vector<T,N>& lhs, vector<T,N>& rhs, vector<T,N>& result)
    {
        Plus<T,N,IDX-1>()(lhs,rhs,result);
        result.data[IDX] = lhs.data[IDX] + rhs.data[IDX];
    }
};

and this partial specialization which appears to do nothing, but handles the case when the index is 0 and ends the recursion:

template<typename T, int N>
struct Plus<T,N,-1>
{
    void operator()(vector<T,N>& lhs, vector<T,N>& rhs, vector<T,N>& result)
    {
        //noop
    }
};

and finally this implementation of operator+ which instantiates Plus and calls it:

template<typename T, int N>
vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
    vector<T, N> result;
    Plus<T,N,N-1>()(lhs,rhs,result);
    return result;
}

You'll need to turn this into an operator to make it more general purpose but you get the idea. However this is mean to the compiler and it may take awhile in big projects even if it is super cool. In practice I have found that hand typing the overloads you want or writing script code to generate the C++ results in a more debuggable experience and code that in the end is simpler to read and easier for the compiler to optimize. More specifically if you write a script to generate the C++ you can include the SIMD intrinsics in the first place and not leave things to chance.

I've considered "template loops", for lack of a better term. I agree that they might be a little rough on the compiler. My original plan was to specialize my functions for the most common vector sizes, and leave my general for loop functions for any other sizes. — Janz, Mar 17 '16 at 02:21

score 0 · Answer 3 · answered Mar 16 '16 at 09:22

Firstly, compiler would probably unroll the loop.
Secondly, for better performance, pass your argument by const reference instead of by value to avoid extra copies.

And to answer your question, you may use std::index_sequence to construct in place, something like:

namespace detail
{

template<typename T, int N, std::size_t...Is>
vector<T, N> add(std::index_sequence<Is...>,
                 const vector<T, N>& lhs,
                 const vector<T, N>& rhs)
{
    return {{ (lhs[Is] + rhs[Is])... }};
}

}

template<typename T, int N>
vector<T, N> operator+(const vector<T, N>& lhs, const vector<T, N>& rhs) {
    return detail::add(std::make_index_sequence<N>{}, lhs, rhs);
}

Demo

I currently pass vectors by const reference, but I'm planning on eventually doing a bit of profiling to see whether it's actually better to pass them by reference or value. Thanks for bringing index_sequence to my attention. It looks like it could be quite useful. — Janz, Mar 17 '16 at 02:03

C++ Vector Template Per-Component Operations

3 Answers3