0

I have a program structure similarly to this:

ssize_t remain = nsamp;
while (!nsamp || remain > 0) { 
    #pragma omp parallel for num_threads(nthread)  
    for (ssize_t ii=0; ii < nthread; ii++) {
        <generate noise>       
    } 

    // write noise
    out.write(data, nthread*PERITER);
    remain -= nthread*PERITER;
 }

The problem is, when I benchmark the output of this, if I run with eg: two threads, sometimes it takes ~ the same time as a single thread, and sometimes I get a 2x speedup, it feels like there's some sort of synchronization race condition that I'm running into, sometimes I hit it and things go smoothly and sometimes (often) not.

Does anyone know what might be causing this and what the right way to parallelize a section inside of an outer while loop is?

Edit: Using strace, I see a lot of calls to sched_yield() This is probably making it look like I'm doing a lot on the CPU but I'm fighting the scheduler for a good scheduling pattern.

gct
  • 14,100
  • 15
  • 68
  • 107

2 Answers2

0

You are creating a new bunch of threads each time the while loop gets entered. After the parallel loop, the threads are destroyed. Because of the nature of a while loop, this might happen irregularily (depending on the condition). So if your loops gets executed only a few times, then the thread creation process might overweigh the actual workload and thus you get at most sequential performance, if not less. However, maybe the parallel system (OpenMP) can detect if the loop is entered many times to keep threads alive.

Nothing guaranteed though.

Thomas Lang
  • 770
  • 1
  • 7
  • 17
  • Is there any way to tell openMP that I'm going to be executing a parallel region repeatedly? – gct Jan 07 '19 at 18:33
  • I am afraid not. Note that there might be other reasons that cause this inside your for loop, my answer was more like a wild guess. It might be helpful if you can rewrite your outer loop into a for loop. This might enable an optimizing compiler to interchange loops or do other magic to make it faster most times. – Thomas Lang Jan 07 '19 at 18:35
  • Unfortunately it might loop indefinitely so I can't really replace the outer loop, I've updated the code example to better show that. – gct Jan 07 '19 at 18:46
  • I see. Welp, this seems not too good, I'm afraid you cannot do much here. Suggestion 1: Measure your "noise-creation-time" sequentially and parallely for a few while-loop conditions and draw a diagram. If this hints that in your application the noise creation has no big influence, then skip parallelism here. If Suggestion 1 fails, you can try searching for a compiler implementing Loopo, which is able to automatically parallelize *some* while loops. – Thomas Lang Jan 07 '19 at 18:51
  • @ThomasLang What's wrong with having the while loop within a `omp parallel` region and an `omp for` inside this loop? – Walter Jan 07 '19 at 19:34
  • That each thread executes the entire loop. This is not what OP wants I think. – Thomas Lang Jan 07 '19 at 20:11
0

I'd suggest something like this. For nsamp == 0 you'll need some more reasonable handling. For proper Signal handling with OpenMP, please refer to this answer.

ssize_t remain = nsamp;
#pragma omp parallel num_threads(nthread) shared(out, remain, data)
while (remain > 0) { 
    #pragma omp for
    for (ssize_t ii=0; ii < nthread; ii++) {
        /* generate noise */
    }
    #pragma omp single
    {
        // write noise
        out.write(data, nthread*PERITER);
        remain -= nthread*PERITER;
    }
}
  • @SeanMcAllister sorry for confusion with comments. There must be 2 barriers in this code (one after `for`, another after `single`). Probably your code is too fast or you might have not enabled OpenMP properly (saying that just in case). You also may try to enable some sort of scheduling for `omp for` pragma instead utilizing all `nthread`s in `omp parallel` (try to remove `num_threads(nthread)` first). – Vyacheslav Napadovsky Jan 07 '19 at 20:02