Based on:enter link description here
Known: number of processors: 28
Code 1:
void fun1() { printf("Hello, world\n"); } #pragma omp parallel { fun1(); }
Code 2:
void fun2() { #pragma omp for for(int i=0;i<10;i++) { printf("Hello, world\n"); } } #pragma omp parallel { fun2(); }
Code 3:
#pragma omp parallel { #pragma omp for for(int i=0;i<10;i++) { printf("Hello, world\n"); } }
Results:
Code1: printf is executed 28*1=28 times.
Code2 is equivalent to Code3: printf is executed 10 times. WHY?WHY NOT printf is executed 28*10=280 times, with each of the 28 threads responsible for the whole for-loop?
ORIGINAL POST:
Question:
Why
#pragma omp parallel { #pragma omp for for(int i=0;i<N;i++){} }
results in that every iteration of the loop is executed 1 time, and why not
#pragma omp for for(int i=0;i<N;i++){}
(i.e. code within { } above) executed as many times as the numbers of threads(denoted as M) according to the specifications of "#pragma omp parallel", namely every iteration of the loop is respectively executed M times by M threads?
or maybe this kind of nested parallel construct by "for" can't be natively explained by the specifications of "#pragma omp parallel" because of implementations ?