I can not load or store with AVX2 intrinsics instructions as I've done in AVX before. No error, just warnings, and it does not perform the load/store instruction at run-time. Other AVX2 instructions work properly but I can not load from memory.
As follows.
AVX:
float t[MAX][MAX];
row0 = _mm256_load_ps(&t[i][j]);
_mm256_store_ps(&t[j][i], row0);
AVX2:
const int32_t a[MAX][MAX]; // I tried int, long, global and local and many other things...
a0_i =_mm256_stream_load_si256 (&a[0][0]);
mm256_store_si256(&a[0][0], a0_i);
So, What is the problem/difference? Is there any idea or solution?