I am working in parallel with OpenMP on an array (working part). If I initialize the array in parallel before, then my working part takes 18 ms. If I initialize the array serially without OpenMP, then my working part takes 58 ms. What causes the worse performance?
The system:
- Intel(R) Xeon(R) CPU E5-2697 v3 (28 cores / 56 threads, 2 Sockets)
Example code:
unsigned long sum = 0;
long* array = (long*)malloc(sizeof(long) * 160000000);
// Initialisation
#pragma omp parallel for num_threads(56) schedule(static)
for(unsigned int i = 0; i < array_length; i++){
array[i]= i%10;
}
// Time start
// Work
#pragma omp parallel for num_threads(56) shared(array, 160000000) reduction(+: sum)
for (unsigned long i = 0; i < array_length; i++)
{
if (array[i] < 4)
{
sum += array[i];
}
}
// Time End