1

I have a similar problem as the question asked here -OpenMP parallel thread. I would hope to parallelize the for loop involves the std::set iterator. So I am trying to study the answer provided by @Hristo lliev. Something I am not quite sure.

If a thread does not get to pick a task, then its "worst_q" should be the original number when it enters the critical section. But if a thread pick up a task, and the "t_worst_q" will be changed inside the task, then when this thread enters the critical section, this "t_worst_q" will remain the same as it was within the task construct, Am I correct ? However, I used an example, it does not appear like this way, maybe I am still missing something.

Community
  • 1
  • 1
user3658306
  • 217
  • 4
  • 15

1 Answers1

0

Yes, the way you explain it is how "Hristo Iliev" wanted it to work. Unfortunately, he got several things about tasks wrong:

His so-called "barrier" using taskwait only works for the thread that created the tasks. All other threads run through it. So his whole reduction in the end is useless because most threads execute it before they actually do any work. The "taskwait" was designed to help to account for data dependencies (very useful in recursive functions!), that's why it works the way it does.

The only reason that the program works sometimes, even without a functional reduction in the end, is that his first mistake is partially cancelled out by a second mistake: The variable t_worst_q within the task references the one from the thread that generated the task and therefore acts as a global variable. But then, of course, we have a race condition in the std::min, which still lets the code fail sometimes.

So in effect his code is equivalent to the following:

#pragma omp parallel
{
   // Create tasks
   #pragma omp single nowait
   {
      for(std::set<size_t>::const_iterator it=mesh->NEList[vid].begin();
          it!=mesh->NEList[vid].end(); ++it) {
         size_t elem = *it;
         #pragma omp task shared(worst_q)
         worst_q = std::min(worst_q, mesh->element_quality(elem));
      }
   }
}

This, as stated before, has the problem of having a race condition in the access to worst_q when executing the std::min. If the function mesh->element_quality is computationally expensive compared to the std::min function, then a simple solution is a critical section around the std::min part (after having executed mesh->element_quality outside the critical section). If, on the other hand, mesh->element_quality isn't more expensive than std::min, then the critical section will kill the parallelism. But in this case the function mesh->element_quality is so cheap that the overhead for managing the tasks will eat up all the potential speed-up anyway.

mastov
  • 2,942
  • 1
  • 16
  • 33