How to align 16-bit ints for use with SSE intrinsics

Question

I am working with two-dimensional arrays of 16-bit integers defined as

int16_t e[MAX_SIZE*MAX_NODE][MAX_SIZE];
int16_t C[MAX_SIZE][MAX_SIZE];

Where Max_SIZE and MAX_NODE are constant values. I'm not a professional programmer, but somehow with the help of people in StackOverflow I managed to write a piece of code that deploys SSE instruction on my data and achieved a significant speed-up. Currently, I am using the intrinsics that do not require data alignment (mainly _mm_loadu_si128 and _mm_storeu_si128).

for (b=0; b<n; b+=8){
    v1 = _mm_loadu_si128((__m128i*)&C[level][b]); // level defined elsewhere.
    v2 = _mm_loadu_si128((__m128i*)&e1[node][b]); // node defined elsewhere.
    v3 = _mm_and_si128(v1,v2);
    _mm_storeu_si128((__m128i*)&C[level+1][b],v3);
}

When I change the intrinsics to their counterparts for aligned data (i.e. _mm_load_si128 and _mm_store_si128), I get run-time errors, which leads me to the assumption that my data is not aligned properly.

My question is now, if my data is not aligned properly, how can I align it to be able to use the corresponding intrinsics? I'd think since the integers are 16 bits, they're automatically aligned. But I seem to be wrong!

Any insight on this will be highly appreciated.

Thanks!

score 8 · Accepted Answer · answered Jun 16 '12 at 22:29

8

SSE needs data to be aligned on 16 bytes boundary, not 16 bits, that's your problem.

What you're looking for to align your static arrays is compiler dependent.

If you're using MSVC, you'll have to use __declspec(align(16)), or with GCC, this would be __attribute__((aligned (16))).

answered Jun 16 '12 at 22:29

rotoglup

5,223
25
37

3

Note that allocating a `__declspec(align(16))` structure with `malloc` or `new` is not guaranteed to get you something aligned to 16 bytes, so you'll need to be very careful where you use those. It may be better to use `posix_memalign` (in POSIX) or `_aligned_malloc` (in VC++). – Cory Nelson Jun 16 '12 at 22:31
`__declspec(align(16))` worked fine for me. I'll look into `_aligned_malloc` also. Thanks for the help. – SMir Jun 16 '12 at 23:48
Is there a way to define a Macro which will be portable between MSVC and GCC? Thank You. – Royi Aug 04 '17 at 23:00

How to align 16-bit ints for use with SSE intrinsics

1 Answers1