-3

I am using openMp on a nested loop which works like this

#pragma omp parallel shared(vector1) private(i,j)
{
#pragma omp for schedule(dynamic)
    for (i = 0; i < vector1.size(); ++i){

       //some code here 

       for (j = 0; j < vector1.size(); ++j){

           //some other code goes here
       #pragma omp critical
       A+=B;
       }
     C +=A;
    }
}

the Problem here is that my code is doing a lot of the computation in the A+=B part of the code. Therefore by making it critical, I am not achieving the speedup I would like. (In fact there appears to be some overhead since my program is taking longer to execute then it being sequentially written).

I tried using

#pragma omp reduction private(B) reduction(+:A)
    A+=B

this speeds up the execution time however is seems that it does not take care of race conditions like the critical clause since I am not getting the same results of A.

Is there an alternative to this i can try?

j.doe
  • 13
  • 6
  • 4
    There is one thing you can do: post a [minimal, complete and verifiable example](http://stackoverflow.com/help/mcve), because with only your pseudo code, there's little we can do to help you. – Gilles Dec 12 '16 at 19:47
  • maybe you are right, but in the meantime reduction() does not take care of race conditions right? – j.doe Dec 12 '16 at 19:54
  • 1
    Of course it behaves like this, do you know what `critical` and `reduction` do? You seem to randomly apply ideas which completely change the meaning of the program. `A` and `B` are undefined so it's quite impossible to guess what do you mean this snippet to do. There is no such thing as "pragma taking care of race conditions", this is handwaving, you are responsible to structure the code so they don't appear. I.e. nothing takes care of it, they are guaranteed not to appear. – luk32 Dec 12 '16 at 19:54
  • 1
    i never said pragma takes care of race conditions I meant that the critical clause only allows one thread at a time to execute the block inside. I should have specified that A and B are of type Vector3 which is a class I created. – j.doe Dec 12 '16 at 20:06
  • Reduction results in a very specific internal translation that is suited for integral types but leads to unexpected results for your vector. See [here](https://msdn.microsoft.com/en-us/library/2etkydkz.aspx) to get an idea what's happening. – The Vee Dec 12 '16 at 20:20

1 Answers1

1

Unless you want to go through the trouble of making your Vector3 class thread-safe or rewriting your operations for use with an std::atomic<Vector3>, both of which would still suffer from performance drawbacks (although not as serious as using a critical section), you can actually mimic the behaviour of OpenMP reduction:

#pragma omp parallel // no need to declare variables declared outside/inside as shared/private
{

    Vector3 A{}, LocalC{}; // both thread-private

    #pragma omp for schedule(dynamic)
    for (i = 0; i < vector1.size(); ++i){

       //some code here 

       for (j = 0; j < vector1.size(); ++j){

           //some other code goes here

           A += B; // does not need a barrier
       }
       LocalC += A; // does not need a barrier
    }

    #pragma omp critical
    C += LocalC;
}

NB that this assumes that you don't access A for reading within your "some code" comments, but you shouldn't anyway if you ever thought of using a reduction clause.

The Vee
  • 11,420
  • 5
  • 27
  • 60
  • Yes A is being read in the outer loop ( I edited the code just now).Iget it now that reduction clause cannot be used my question still remains is there an alternative? – j.doe Dec 12 '16 at 20:42
  • @j.doe I don't see that in your question. Are you sure you want to read the updated value? Reading `A` in a multithreaded situation while other threads write to it sounds like a recipe for trouble. If it's really so you may want to make it `volatile` or something, but I'd like to hear more details on why you'd want to do that. – The Vee Dec 12 '16 at 20:49
  • i just updated the question `c+=A` is where A is being read. before the `C+=A` I also have a ` #pragma omp critical. but only the critical clause on A seems to slow down the execution time since C is not doing computationally intensive instructions. – j.doe Dec 12 '16 at 20:57
  • My comment remains. Basically in the new version of your code `A` is my `LocalSum` and `C` is the final variable that I call `A`. Do you need to access one thread's `LocalSum` from other threads? If so, please show a bit more of the code. If not, you don't need `critical` there. – The Vee Dec 12 '16 at 21:00
  • no but in your code `A +=localsum` is outside of the nested loop, my `C+=A` is inside the outer loop – j.doe Dec 12 '16 at 21:30
  • And it's OK that for example `i == 0` and `i == 4` could both see the original value of `C` while `i == 1` the result of the additions of the results of `i == 4` and `i == 5`? – The Vee Dec 12 '16 at 22:17
  • @j.doe If the answer is yes, you don't need a `critical` on `A += B` but do need it on `C += A` (still trading the major reason for the slowdown from quadratic in `vector1.size()` to linear). If the answer is no, see whether the update works for you, I renamed the variables but kept the idea. – The Vee Dec 12 '16 at 22:26
  • Maybe I do not understand your question. You did not specify whether `A` is shared or private, and there's no declaration, but from the code it looks like it is private so I assumed as much. – The Vee Dec 12 '16 at 22:28