Introduction:
I'm writing a function to process 4 packed long long int
in x86_64
assembly using AVX2
instruction. Here is how my header file looks like:
avx2.h
#define AVX2_ALIGNMENT 32
// Processes 4 packed long long int and
// returns a pointer to a result
long long * process(long long *);
The assembly implementation of the process
function looks as follows:
avx2.S
:
global process
process:
vmovaps ymm0, [rdi]
;other instructions omitted
The vmovaps ymm0, [rdi]
requires rdi
to be 32-bytes aligned. In assembly it is controlled by the align 32
directive.
The problem:
When compiling with GCC
it has the __BIGGEST_ALIGNMENT__
definition which on my implementation is 16. The C18 Standard at 6.2.8/3
claims that
An extended alignment is represented by an alignment greater than
_Alignof (max_align_t)
. It is implementation-defined whether any extended alignments are supported and the storage durations for which they are supported.
So the implementation-defined extended alignment on GCC is also 16 and I'm not sure if the code causes UB:
#include "avx2.h"
//AVX2_ALIGNMENT = 32, __BIGGEST_ALIGNMENT__ = 16
_Alignas(AVX2_ALIGNMENT) long long longs[] = {1, 32, 432, 433};
long long *result = process(longs);
Is there a way to rewrite the code without UB? (I'm aware about intrinsic immintrin.h
, this is not the topic of the question).