I would like to take advantage of OpenMP to make my task parallel.
I need to subtract the same quantity to all the elements of an array and write the result in another vector. Both arrays are dynamically allocated with malloc
and the first one is filled with values from a file. Each element is of type uint64_t
.
#pragma omp parallel for
for (uint64_t i = 0; i < size; ++i) {
new_vec[i] = vec[i] - shift;
}
Where shift
is the fixed value I want to remove from every element of vec
. size
is the length of both vec
and new_vec
, which is approximately 200k.
I compile the code with g++ -fopenmp
on Arch Linux. I'm on an Intel Core i7-6700HQ, and I use 8 threads. The running time is 5 to 6 times higher when I use the OpenMP version. I can see that all the cores are working when I run the OpenMP version.
I think this might be caused by a False Sharing issue, but I can't find it.