OpenMP parallel for with ordered and critical directives

Question

I've been trying to understand how OpenMP parallel for loop works when combined with critical sections and ordered directives. There are a couple of code samples which I find confusing:

1. OpenMP parallel for loop is used to initialize the array s with the loop index i and the thread ID. No ordered directives or critical sections are used.


    #include <stdio.h>
    #include <omp.h>
    
    #define N       10
    #define CHUNKSIZE   1
    
    int main(int argc, char* argv[])
    {    
        int i, chunk = CHUNKSIZE;    
        char s[N][22];
    
    #pragma omp parallel for shared(s,chunk) private(i)  schedule(static, chunk) 
        for (i = 0; i < N; ++i)
        {
            int tid = omp_get_thread_num();
            sprintf(s[i], "%d:%d", i, tid);
            printf("i: %d tid: %d\n", i, tid);
        }
    
        puts("\nArray initialization order:");
        for (i = 0; i < N; ++i)
            puts(s[i]);
    
    }

It prints the following:

i: 7 tid: 7
i: 4 tid: 4
i: 5 tid: 5
i: 6 tid: 6
i: 0 tid: 0
i: 8 tid: 0
i: 3 tid: 3
i: 1 tid: 1
i: 2 tid: 2
i: 9 tid: 1

Array initialization order:
0:0
1:1
2:2
3:3
4:4
5:5
6:6
7:7
8:0
9:1

I am failing to figure out why s contains the i indices (first number) in a strict sequence despite the absence of the ordered directives and why printf("i: %d tid: %d\n", i, tid) shows them in a different order?

2. Adding ordered to the omp parallel for clause doesn't seem to change anything unless omp ordered is put inside the loop body.

#pragma omp parallel for shared(s,chunk) private(i)  schedule(static, chunk) ordered
    for (i = 0; i < N; ++i)
    {
        int tid = omp_get_thread_num();
        sprintf(s[i], "%d:%d", i, tid);
        printf("i: %d tid: %d\n", i, tid);
    }

Produces the same result as before: sprintf(s[i], "%d:%d", i, tid) initializes the array with a strict sequence of i, whereas printf("i: %d tid: %d\n", i, tid) prints i in an arbitrary order.

#pragma omp parallel for shared(s,chunk) private(i)  schedule(static, chunk) ordered
    for (i = 0; i < N; ++i)
    {
        int tid = omp_get_thread_num();
        sprintf(s[i], "%d:%d", i, tid);
#pragma omp ordered
        printf("i: %d tid: %d\n", i, tid);
    }

Now everything happens in the sequence of i:

i: 0 tid: 0
i: 1 tid: 1
i: 2 tid: 2
i: 3 tid: 3
i: 4 tid: 4
i: 5 tid: 5
i: 6 tid: 6
i: 7 tid: 7
i: 8 tid: 0
i: 9 tid: 1

Array initialization order:
0:0
1:1
2:2
3:3
4:4
5:5
6:6
7:7
8:0
9:1

Again, I don't understand why we need to place the omp ordered inside the loop body to enforce the order of prints wheres array initialization doesn't need that.

3. Use critical section to ensure that only one thread at a time executes the loop body:

#pragma omp parallel for shared(s,chunk) private(i)  schedule(static, chunk) ordered
    for (i = 0; i < N; ++i)
#pragma omp critical
    {
        int tid = omp_get_thread_num();
        sprintf(s[i], "%d:%d", i, tid);
        printf("i: %d tid: %d\n", i, tid);
    }

Again, prints i in an arbitrary order, and initializes s in a strict order of i:

i: 1 tid: 1
i: 4 tid: 4
i: 3 tid: 3
i: 2 tid: 2
i: 5 tid: 5
i: 0 tid: 0
i: 7 tid: 7
i: 6 tid: 6
i: 8 tid: 0
i: 9 tid: 1

Array initialization order:
0:0
1:1
2:2
3:3
4:4
5:5
6:6
7:7
8:0
9:1

This is totally bewildering since in my understanding the critical section must guarantee that sprintf and printf statements are executed by the same thread without any interruptions.

Any help to clear this up will be highly appreciated.

score 2 · Accepted Answer · answered Jun 27 '20 at 09:16

I am failing to figure out why s contains the i indices (first number) in a strict sequence despite the absence of the ordered directives and why printf("i: %d tid: %d\n", i, tid) shows them in a different order?

With static scheduling there is a fixed mapping between loop iteration and thread that executes it, which is why no matter how many times you run the program, if the number of threads is kept the same, s[i] will always be set to "i:same_thread_id". Printing s[] takes place in a sequential loop outside the parallel region, hence the output is ordered. I would be more surprised if that loop were to print things out of order. As for the printf() calls within the parallel region, you have schedule(static,1), which means each iteration gets executed by a different thread, and those run in arbitrary order.

Adding ordered to the omp parallel for clause doesn't seem to change anything unless omp ordered is put inside the loop body.

That is exactly how ordered works. There are the ordered clause and the ordered region. The clause modifies the behaviour of the for worksharing construct and enables ordered execution of the denoted region inside. There are additional synchronisation requirements for ordered execution to work properly that aren't needed otherwise, which is why the clause exists. Also, the region exists so that only a part of the loop can run in order. Having the entire loop body run in order is meaningless as it is no different from sequential (non-parallel) loop execution. See this answer of mine for more details.

This is totally bewildering since in my understanding the critical section must guarantee that sprintf and printf statements are executed by the same thread without any interruptions.

Critical sections guarantee that no two threads execute the same code region simultaneously. They in no way enforce the order of the encountering threads. Since no two threads access the same element of s[], having a critical section in 3. changes nothing. It serialises the loop execution since no two threads can execute the body at the same time, but it doesn't make the loop run in sequential order.

OpenMP parallel for with ordered and critical directives

1 Answers1