As the title reads, I am trying to use STL vector with SIMD intrinsic data type. I know it is not a good practice due to the potential overhead of load/store, but I encountered a quite weird fault. Here is the code:
#include "immintrin.h"
#include <vector>
#include <stdio.h>
#define VL 8
int main () {
std::vector<__m256> vec_1(10);
std::vector<__m256> vec_2(10);
float * tmp_1 = new float[VL];
printf("vec_1[0]:\n");
_mm256_storeu_ps(tmp_1, vec_1[0]); // seems to go as expected
for (int i = 0; i < VL; ++i)
printf("%f ", tmp_1[i]);
printf("\n");
delete tmp_1;
float * tmp_2 = new float[VL];
printf("vec_2[0]:\n");
_mm256_storeu_ps(tmp_2, vec_2[0]); // segmentation fault
for (int i = 0; i < VL; ++i)
printf("%f ", tmp_2[i]);
printf("\n");
delete tmp_2;
return 0;
}
I compiled it using g++ -O3 -g -std=c++11 -mavx2 test.cpp -o test
. vec_1[0]
is printed as expected (all zeros), but segmentation fault happens when it comes to vec_2[0]
. I thought it was the alignment issue, but instead of _mm256_store_ps
, I used _mm256_storeu_ps
, which does not require alignment.
It is a Intel Haswell architecture with AVX2 extension. GCC version is 4.8.5.
Any possible clue is welcome.