OpenMP - sections directive; Linux slower than Windows

Question

I have a simple code prepared for testing. This is the most important piece of the code:

#pragma omp parallel sections
{
 #pragma omp section 
 {
 for (int j=0;j<100000;j++)
  for (int i=0;i<1000;i++) a1[i]=1;
 }
 #pragma omp section 
 {
 for (int j=0;j<100000;j++)
  for (int i=0;i<1000;i++) a2[i]=1;
 }
}

I compiled the program with MinGW compiler and results are as I expected. As I am going to use a computer with Linux only, I compiled the code on Linux (using the same machine). I used gcc 4.7.2 and intel 12.1.0 compilers. The efficiency of the program significantly decreased. It is slower than sequential program (omp_set_num_threads(1))

I have also tried with private arrays in threads, but the effect is similar.

Can someone suggest any explanation?

what is you windows and Linux Hardware configuration ? are you sure of having -fopenmp on the gcc command line ? — alexbuisson, Jul 11 '13 at 11:27
Thank You for reply. I compile the code with command: g++ -fopenmp name.cpp. I will check hardware configuration. — user2572031, Jul 11 '13 at 12:20
I will check hardware configuration. This code is strange, that's true. It's only for evaluation. However, it should work properly. Am I right? — user2572031, Jul 11 '13 at 12:27
I'd be skeptical of any timing results from 'funny' code like this; compilers could easily optimize away the `j` loop leaving you with something so trivial that the timing results are meaningless. Is it possible you compiled w/ MinGW with optimizations on? — Jonathan Dursi, Jul 11 '13 at 13:07
Jonathan Dursi, thank You for reply. I compiled using MinGW without optimization. In the original code parallel code looks like this: #pragma omp section { diff(array1); } #pragma omp section { diff(array2); } I can't understand why on Windows it works properly. — user2572031, Jul 11 '13 at 13:21
The versions of compilers are the same (gcc 4.7.2). The compilation command is also the same. It looks for me, like alexbuisson said, that is the hardware configuration issue. I will try to run different program, according to Your suggestion, that this code is to 'funny'. — user2572031, Jul 11 '13 at 13:27
Try comparing with optimization on ( `-O3` ). The whole point in using OpenMP is for optimization so its silly to compare debug mode performance. — Z boson, Jul 12 '13 at 08:47
Let me guess - you are measuring time using `clock()`, aren't you? `clock()` ticks with the real time on Windows and with the total CPU time of all process threads on Linux, hence it would look like OpenMP programs run slower than their serial counterparts on Linux. Use `omp_get_wtime()` instead for portable timing. — Hristo Iliev, Aug 22 '13 at 07:21

Michael Aquilina · Answer 1 · 2014-01-15T09:28:37.453

I don't exactly understand what you mean to achieve with your code but the difference in efficiency could be due to the compiler you are making use of not knowing how to handle code which has sections-within-sections.

First off, try a different compiler. From my experience gcc-4.8.0 works better with OpenMP so maybe you could try that to start off.

Secondly, use optimisation flags! If you are measuring performance than it would only be fair to use either -O1 -O2 or -O3. The latter will give you the best performance but takes some short-cuts with mathematical functions which make floating point operations slightly less accurate.

g++ -fopenmp name.cpp -O3

You can read up more on compiler flags on this page if it interests you.

As an end note, don't know how experienced you are with OpenMP, but when dealing with loops in OpenMP you would usually use the following:

#pragma omp parallel for
for(int i=0; i<N; ++i)
   doSomething();

Additionally, if you are using nested loops, then you can use the collapse directive to inform your compiler to turn your nested loops into a single one (which can lead to better performance)

#pragma omp parallel for collapse(2) private(i, j)
for(int i=0; i<N; ++i)
   for(int j=0; j<N; ++j)
       doSomething();

There are some things you should be aware of when using collapse which you can read about here. I personally prefer manually converting them into single loop as from my experience this proves even more efficient.

OpenMP - sections directive; Linux slower than Windows

1 Answers1