I want to build a datatype that represents multiple (say N
) arithmetic types and provides the same interface as an arithmetic type using operator overloading, such that I get a datatype like Agner Fog's vectorclass.
Please look at this example: Godbolt
#include <array>
using std::size_t;
template<class T, size_t S>
class LoopSIMD : std::array<T,S>
{
public:
friend LoopSIMD operator*(const T a, const LoopSIMD& x){
LoopSIMD result;
for(size_t i=0;i<S;++i)
result[i] = a*x[i];
return result;
}
LoopSIMD& operator +=(const LoopSIMD& x){
for(size_t i=0;i<S;++i){
(*this)[i] += x[i];
}
return *this;
}
};
constexpr size_t N = 7;
typedef LoopSIMD<double,N> SIMD;
SIMD foo(double a, SIMD x, SIMD y){
x += a*y;
return x;
}
That seems to work pretty good up to a certain number of elements, which is 6 for gcc-10 and 27 for clang-11. For a larger number of elements the compilers do not use the FMA (e.g. vfmadd213pd
) operations anymore. Instead they proceed the multiplications (e.g. vmulpd
) and additions (e.g. vaddpd
) separately.
Questions:
- Is there a good reason for this behavior?
- Is there any compiler flag such that I can increase the above mentioned values of 6 for gcc and 27 for clang?
Thank you!