0

I can not load or store with AVX2 intrinsics instructions as I've done in AVX before. No error, just warnings, and it does not perform the load/store instruction at run-time. Other AVX2 instructions work properly but I can not load from memory.

As follows.

AVX:

float t[MAX][MAX];
row0 = _mm256_load_ps(&t[i][j]);
_mm256_store_ps(&t[j][i], row0);

AVX2:

const int32_t a[MAX][MAX]; // I tried int, long, global and local and many other things... 
a0_i =_mm256_stream_load_si256 (&a[0][0]);
mm256_store_si256(&a[0][0], a0_i);

So, What is the problem/difference? Is there any idea or solution?

Paul R
  • 208,748
  • 37
  • 389
  • 560
ADMS
  • 117
  • 3
  • 18

1 Answers1

1

If you look at the prototype for _mm256_stream_load_si256:

__m256i _mm256_stream_load_si256 (__m256i const* mem_addr);

you can see that you need to cast the to the correct type, i.e.:

a0_i =_mm256_stream_load_si256 ((__m256i *)&a[0][0]);
                                 ^^^^^^^^^ ^

You also forgot to take the address of the first element of the array, and you have a couple of further mistakes in the subsequent store also:

_mm256_store_si256((__m256i *)&a[0][0], a0_i);
^                   ^^^^^^^^^ 

Note that when you have this compiling OK, your next problem may be memory alignment at run-time.

Paul R
  • 208,748
  • 37
  • 389
  • 560