I am working with two-dimensional arrays of 16-bit integers defined as
int16_t e[MAX_SIZE*MAX_NODE][MAX_SIZE];
int16_t C[MAX_SIZE][MAX_SIZE];
Where Max_SIZE
and MAX_NODE
are constant values. I'm not a professional programmer, but somehow with the help of people in StackOverflow I managed to write a piece of code that deploys SSE instruction on my data and achieved a significant speed-up. Currently, I am using the intrinsics that do not require data alignment (mainly _mm_loadu_si128
and _mm_storeu_si128
).
for (b=0; b<n; b+=8){
v1 = _mm_loadu_si128((__m128i*)&C[level][b]); // level defined elsewhere.
v2 = _mm_loadu_si128((__m128i*)&e1[node][b]); // node defined elsewhere.
v3 = _mm_and_si128(v1,v2);
_mm_storeu_si128((__m128i*)&C[level+1][b],v3);
}
When I change the intrinsics to their counterparts for aligned data (i.e. _mm_load_si128
and _mm_store_si128
), I get run-time errors, which leads me to the assumption that my data is not aligned properly.
My question is now, if my data is not aligned properly, how can I align it to be able to use the corresponding intrinsics? I'd think since the integers are 16 bits, they're automatically aligned. But I seem to be wrong!
Any insight on this will be highly appreciated.
Thanks!