0

here a piece of simplified code that causes me a problem/error (Vec4d is coming from the Agner Fog library VCL)

#define AVX256_ALIGNED_MALLOC(type,size) (type *)_aligned_malloc(size * sizeof(type),32)
#define AVX256_FREE(ptr) _aligned_free(ptr)

int N = 1024;
std::vector<double> A(N);
double* Aaligned = AVX256_ALIGNED_MALLOC(double, N);
memcpy(Aaligned, &A[0], N * sizeof(double));

int N4=N>>2;

for(size_t i = 4; i <N4-4; ++i)
{
    //....
    ... = ((Vec4d*)(Aaligned - 1))[i] + ((Vec4d*)(Aaligned))[i] + ((Vec4d*)(Aaligned + 1))[i];
}

AVX256_FREE(Aaligned);

If it is clear to me that I'am allowed to use

((Vec4d*)(Aaligned))[i]

Can you confirm that I cannot use

((Vec4d*)(Aaligned-1))[i] 

or ((Vec4d*)(Aaligned+1))[i] Any hints ? Many thanks. Luc

Luc
  • 11
  • 3
  • 1
    The misaligned access may fail, depending on how the compiler implements it. Use Vec4d().load, rather than Vec4d().load_a if you want to use misaligned data. – A Fog Oct 28 '22 at 06:48
  • You can find aligned container classes for the Vector Class Library at https://github.com/vectorclass/add-on/tree/master/containers – A Fog Oct 28 '22 at 06:50
  • Is it even safe to point a `Vec4d*` at some `double[]` data? `__m256d` is defined as `__attribute__((may_alias))`, but `Vec4d` isn't, so that's a strict-aliasing violation. Use load intrinsics like `_mm256_loadu_pd`, or the `.load` member function. Also, use an aligned allocator on your `std::vector` if you really want alignment, or just use unaligned loads on your std::vector directly. Definitely don't allocate + memcpy! – Peter Cordes Oct 28 '22 at 14:53
  • Thanks for these hints. May the instruction "load" suffer from a disabling calculation time compared to the casting operation (Vec4d*) ? Thank you. – Luc Nov 18 '22 at 15:15
  • Thanks for these hints. May the instruction "load" suffer from a disabling calculation time compared to the casting operation (Vec4d*) ? Put another way, does "load" perform a copy or does it simply wrap a memory buffer ? Thank you – Luc Dec 02 '22 at 14:56

0 Answers0