I am using openMp on a nested loop which works like this
#pragma omp parallel shared(vector1) private(i,j)
{
#pragma omp for schedule(dynamic)
for (i = 0; i < vector1.size(); ++i){
//some code here
for (j = 0; j < vector1.size(); ++j){
//some other code goes here
#pragma omp critical
A+=B;
}
C +=A;
}
}
the Problem here is that my code is doing a lot of the computation in the A+=B
part of the code. Therefore by making it critical, I am not achieving the speedup I would like. (In fact there appears to be some overhead since my program is taking longer to execute then it being sequentially written).
I tried using
#pragma omp reduction private(B) reduction(+:A)
A+=B
this speeds up the execution time however is seems that it does not take care of race conditions like the critical
clause since I am not getting the same results of A.
Is there an alternative to this i can try?