Assume I have these loops:
#pragma omp parallel for
for(int i=0;i<100;++i)
{
// some big code here
#pragma omp parallel for
for(int j=0;j<200;j++)
{
// some small code here
}
}
Which loop runs in parallel? Which one is the best to run in parallel?
The main point here is:
1- if the i-loop runs in parallel, since there is some big code there, there is a good chance that CPU cache hits on every iteration of the loop.
2- If the j-loop runs in parallel, since there is not much code there, it probably doesn't hit CPU cache, but I am losing running the big code in parallel.
I don't know how openMP runs these for loops in parallel so I can optimize them?
My code should run on windows (visual studio) and ARM Linux.