I am interested in using the clang vector extension such as:
typedef float vec3 __attribute__((ext_vector_type(3)));
I have 2 questions:
As you can see in the example above and below, I am primarily interested in using them for manipulating vec3 vector (xyz). My understanding is that typically the size of a SIMD unit is 128 bits. If I were to use a vec3 then this would take 96 bits. So I am wondering if there's a penalty for not using exactly 128-bits or if I were to use vec2, maybe the compiler would be able to pack two vec2 in the unit? Should I use vec4 instead, even if I won't be using the fourth element in most cases? Is it better from an alignment/performance standpoint?
I would like to eventually "measure" how much more efficient using these extensions is vs. using standard structs. Besides running them in a loop a great number of times (and measuring time), I don't know of any other way but that seems very naïve. That's not even very informative in the case of the small example I provide below because when I compile this with the
-O3
the code runs really super fast either way. Can I also somehow say that these are optimized by looking at the generated ASM code (I tried and even though the code is rather short, the ASM-generated code is already quite long and besides understanding the basics this is a bit overwhelming)? Suggestions would be greatly appreciated where my goal is essential to prove to myself) that using these extensions produces an executable that runs faster.
typedef float vec3 __attribute__((ext_vector_type(3)));
struct vec3f { float x, y, z; };
int main(int argc, char **argv)
{
for (unsigned long i = 0; i < 1e12; ++i) {
for (unsigned long j = 0; j < 1e12; ++j) {
#if 1
vec3 a = {1, 0, 0};
vec3 b = {0, 1, 0};
vec3 lhs = a.yzx * b.zxy;
vec3 rhs = a.zxy * b.yzx;
vec3 c = lhs - rhs;
#else
vec3f a = {1, 0, 0};
vec3f b = {0, 1, 0};
vec3f c;
c.x = a.y * b.z - a.z * b.y;
c.y = a.z * b.x - a.x * b.z;
c.z = a.x * b.y - a.y * b.x;
#endif
//printf("%f %f %f\n", c.x, c.y, c.z);
}
}
return EXIT_SUCCESS;
}