I am studying OpenMP and I have some questions that I believe will clear up my thoughts.
I have a small example of a matrix multiplication A*B where A,B,C are global variables. I know how we can parallelize the for loops one at a time or both together with collapse, but my question is :
Ιn which loop if I use #pragma omp for
should I overlook the critical section in check1 where we need to add there because C is a global variable and also in which loop should I use the keyword nowait to avoid barrier in the loop because I know #pragma omp for
it has it automatically. When I am trying to program this nested for loop am making this : my_approach
int i,j,sum;
for(int i=0;i<N;i++) # loop1
for(j=0;j<N;j++){ #loop2
for(k=sum=0;k<N;k++) #loop3
sum += A[i][j]*B[k][J]
C[i][j] = sum # check1
};
my_approach
#pragma omp parallel num_threads(4)
{
#pragma omp for schedule(static) nowait // **one**
for(int i=0;i<N;i++) # loop1
for(j=0;j<N;j++){ #loop2
for(k=sum=0;k<N;k++) #loop3
sum += A[i][j]*B[k][J]
#pragma omp critical // **two**
C[i][j] = sum # check1
};
}
- one : I put "nowait" there because code runs faster with that , I dont know the reason or if I am making the right decision
- two : I use critical section thinking of how I would have builded it with threads.
So lets say that this is right what about with parallizing second for loop or third do i need those things or not ? If someone can explain to me when I need to add critical section or nowait if I parallelize this nested for loops one at a time I would appreciate!