Sorry for the previous non-descriptive question. Please allow me to rephrase the question again:
The setup:
I need to do ADD and some bit wise operations of 4 32-bit values from 4 arrays at the same time using SSE. All the element in these 4 arrays are integer size (32 bit). The result goes to the 5th array.
So my question is:
- What header files and compiler flags do I need to include such that I can run the SSE using C?
- Does the example code provide by Paul still work?
Another question, if I need to read last bit from integer A, and first bit from integer B, and replace the last bit and first bit in integer C by the values I just read, can I use SSE here? Or is there any fast way to do it? Instead of 3 access in normal case?
Code provided by Paul
#include <stdint.h>
#include <emmintrin.h>
const size_t N = 4096; // size of input/output arrays
int32_t array0[N]; // 4 x input arrays
int32_t array1[N];
int32_t array2[N];
int32_t array3[N];
int32_t array_sum[N]; // output array
for (size_t i = 0; i < N; i += 4)
{
__m128i v0 = _mm_load_si128(&array0[i]); // load 4 x vectors of 4 x int
__m128i v1 = _mm_load_si128(&array1[i]);
__m128i v2 = _mm_load_si128(&array2[i]);
__m128i v3 = _mm_load_si128(&array3[i]);
__m128i vsum = _mm_add_epi32(v0, v1); // sum vectors
__m128i vsum = _mm_add_epi32(vsum, v2);
__m128i vsum = _mm_add_epi32(vsum, v3);
_mm_store_si128(&array_out[i], vsum); // store sum
}