Why "#pragma omp parallel {#pragma omp parallel for}" differs from "#pragma omp parallel" in execution times?

Question

Based on：enter link description here

Known: number of processors: 28

Code 1:

void fun1()
{
    printf("Hello, world\n");
}
#pragma omp parallel
{
    fun1();
}

Code 2:

void fun2()
{
    #pragma omp for
    for(int i=0;i<10;i++)
    {
        printf("Hello, world\n");
    }
}
#pragma omp parallel
{
    fun2();
}

Code 3:

#pragma omp parallel
{
    #pragma omp for
    for(int i=0;i<10;i++)
    {
        printf("Hello, world\n");
    }
}

Results:

Code1: printf is executed 28*1=28 times.

Code2 is equivalent to Code3: printf is executed 10 times. WHY？WHY NOT printf is executed 28*10=280 times, with each of the 28 threads responsible for the whole for-loop?

ORIGINAL POST:

Question:

Why
#pragma omp parallel
{
    #pragma omp for
    for(int i=0;i<N;i++){}
}
results in that every iteration of the loop is executed 1 time, and why not
#pragma omp for
for(int i=0;i<N;i++){}
(i.e. code within { } above) executed as many times as the numbers of threads(denoted as M) according to the specifications of "#pragma omp parallel", namely every iteration of the loop is respectively executed M times by M threads?

or maybe this kind of nested parallel construct by "for" can't be natively explained by the specifications of "#pragma omp parallel" because of implementations ?

Your question is not clear for me. There is no parallel region (`#pragma omp parallel`) in your second code. Without a parallel region `#pragma omp for` has no effect at all. — Laci, Jun 07 '22 at 05:27
I have made some changes in my description. The second code snippet is exactly the lines enclosed by outer {} in the first snippet. @Laci — Victor Li, Jun 07 '22 at 06:36

Michael Klemm · Answer 1 · 2022-06-07T13:45:17.943

This code:

#pragma omp for
for(int i=0;i<N;i++){}

is practically sequential code. As per the section Worksharing-Loop Construct in the OpenMP specification, the for construct needs a parallel construct that it binds to. The parallel construct creates the threads that the for uses to execute in parallel. So, you indeed have to write

#pragma omp parallel  // creates the threads
{
    #pragma omp for   // execute in parallel
    for(int i=0;i<N;i++){}
}

You can use the shorter form, too:

#pragma omp parallel for   // create threads & execute in parallel
for(int i=0;i<N;i++){}

UPDATE (to reflect the update to the original post):

Code 1 in the original post runs 28 threads in the parallel region, each calling the function, and printing "Hello World".

Code 2 and code 3 spawn 28 threads. Code 2 calls the function and the for construct distributes 10 loop iterations across 28 threads. Since there are only 10 iterations, only 10 invocations of printf will happen, and only 10 threads will actively print. The other 18 will do nothing. Same for Code 3.

The link I have provided explains what the for construct does.

I have adjusted my question description to make it clear. The code "#pragma omp for for(int i=0;i — Victor Li, Jun 07 '22 at 13:11

score 0 · Answer 2 · answered Jun 07 '22 at 13:43

The two basic concepts in OpenMP are 1. the parallel region: if you encounter omp parallel a team of threads is created, and each thread starts executing the region. And 2. "worksharing constructs", of which omp for is the most obvious one. If you have a team of threads, the work is distributed over those threads. So in both your codes 2 & 3 you create a team, and then the team encounters the loop and distributes the iterations.

You are wondering why not every thread executes the whole loop? That would happen if you omit the omp for. In that case the loop is an instruction like any other, and each thread executes it in its entirety.

Why "#pragma omp parallel {#pragma omp parallel for}" differs from "#pragma omp parallel" in execution times?

2 Answers2