0

All the tutorial examples for openmp that I see are for creating threads for for loops. But I need to create threads for ordinary groups of statements which may be clustered into functions. eg, something like the following:

#include <stdio.h>
#include <omp.h>
int A() { printf("in A:%d\n", omp_get_thread_num()); }
int B() { printf("in B:%d\n", omp_get_thread_num()); }
int D() { printf("in D:%d\n", omp_get_thread_num()); }
int E() { printf("in E:%d\n", omp_get_thread_num()); }
int F() { printf("in F:%d\n", omp_get_thread_num()); }
int G() { printf("in G:%d\n", omp_get_thread_num()); }
int H() { printf("in H:%d\n", omp_get_thread_num()); }
int C() {
    printf("in C:%d\n", omp_get_thread_num());
    #pragma omp parallel num_threads(2)
    {
        D(); // want to execute D,E in separate threads
        E();
    }
    F();
}
main() {
    omp_set_nested(1);
    printf("in main:%d\n", omp_get_thread_num());
    G();
    #pragma omp parallel num_threads(3)
    {
        A(); // want to execute A,B,C in separate threads
        B();
        C();
    }
    H();
}

In the above code, I want each function to execute exactly once, but in different threads. (So I might be wrong in the above code with the usage of the directives, please correct it as needed.)

How do I code this kind of nested parallelism of functions with openmp? Will these functions share all the global variables that are available, or is there a way to specify which variables will be shared by which functions?

EDITS: After reading Jorge Bellon's answer below, I coded the following, and its output is shown after the code. It looks like thread-0 is being used for many of the functions, which is not what I intended - I want the functions to be executed in parallel. Also, I want only one execution for G, so looks like I have to delete the "num_threads(3)" line. Let me know what is the fix for this problem.

// compile this with: g++ -fopenmp
int A() { printf("in H:%d\n", omp_get_thread_num()); sleep(1); }
// similarly for B, D, E, F, G, H
int C() {
    printf("in C:%d\n", omp_get_thread_num()); sleep(1);
    #pragma omp task
    D();
    #pragma omp task
    E();
    #pragma omp taskwait
    F(); sleep(1);
}
main() {
    omp_set_nested(1);
    printf("in main:%d\n", omp_get_thread_num());
    #pragma omp parallel num_threads(3)
    G();
    #pragma omp task
    A();
    #pragma omp task
    B();
    #pragma omp task
    C();
    #pragma omp taskwait
    H();
}
// outputs:
in main:0
in G:1
in G:0
in G:2
in A:0
in B:0
in C:0
in D:0
in E:0
in F:0
in H:0
R71
  • 4,283
  • 7
  • 32
  • 60

2 Answers2

1

The best way to parallelize this kind of code is using OpenMP task constructs. Your parallel region will create a pool of threads, a master thread will create the outer tasks, and the rest of the threads will process those tasks as soon as they get available.

// [...]

int C() {
  // You can create tasks within tasks
  // In this example is better to place {D,E} and {E} in tasks
  // and omit the task construct of C function call
  #pragma omp task
  {
    D();
    E();
  }
  // if F() needs D and E to finish, a taskwait is necessary
  F();
}

main() {
  // omp_set_nested no longer necessary
  printf("in main:%d\n", omp_get_thread_num());
  G();
  #pragma omp parallel num_threads(3)
  #pragma omp single
  {
    // a single thread creates the tasks
    // other threads in the team will be able to execute them
    // want to execute A,B,C in separate threads
    #pragma omp task
    A();
    #pragma omp task
    B();
    #pragma omp task
    C();
    // wait until all the tasks have been finished
    #pragma omp taskwait
  }
  H();
}

Whether each function is executed in a different thread depends completely on the state of the program at run time. This means that some tasks may be executed in the same thread if all other threads are busy, which is not particularly a problem.

You can use task dependences (as of OpenMP 4) to control whether a task is allowed to go on execution at the point of creation.

Jorge Bellon
  • 2,901
  • 15
  • 25
  • Thanks. I tried your solution, but looks like I am getting only one thread. Pls see the details added in the question above. – R71 Sep 06 '17 at 13:18
  • I dont need the tasks to go on beyond a local join, so I dont need the details of task dependencies. – R71 Sep 06 '17 at 13:19
  • You need to specify the scope the threads will run in parallel. In the task based code you show, there parallel scope is only `G();`. You need to use angle brackets `{ }` (see my example) to define that the parallel part will include all the following lines until the end of the main function. To know if tasks are running in parallel, add `get_thread_num()` return value to the `printf`, so that you know which thread is running that task. – Jorge Bellon Sep 06 '17 at 13:33
  • Yes, I am printing the thread with get_thread_num (my question on printing thread num was for the pthread version). I tried your scopes also - but the threads are not being printed correctly. eg, I tried playing with different delays in the functions, and I am sometimes seeing DE or ABC with the same thread number, and I dont understand how that can happen if these functions are being executed concurrently. – R71 Sep 06 '17 at 14:07
  • It may happen if you have a high amount of functions and a low amount of idle threads. Think that one of your threads is creating the tasks. When the tasks are so short as yours, it might take more time to create the tasks than execute them. Try to introduce some `usleep` in the tasks to increment their execution time. – Jorge Bellon Sep 06 '17 at 14:11
  • Actually, I am using sleep, which sleeps for seconds, so what you are suggesting is to use usleep to sleep for microseconds, ie to reduce the sleep time? Need to test it out exhaustively, to make sure that there are really no random bugs! For now, the c++11 thread version is working for me (I have updated the code above), so I will drop the openmp version for now. – R71 Sep 06 '17 at 14:45
0

The following solution is implemented with c++11 threads. A detailed openmp version is still to be worked out.

// compile this with: g++ -pthread -std=gnu++0x
#include <stdio.h>
#include <unistd.h> // for sleep
#include <thread>
#include <iostream>
#include <sstream>
using namespace std;
int A() { stringstream ss; ss << this_thread::get_id();
          printf("in A:%s\n", ss.str().c_str()); sleep(1); }
// similarly for B, D, E, F, G, H
int C() {
    stringstream ss; ss << this_thread::get_id();
    printf("in C:%s\n", ss.str().c_str()); sleep(1);
    std::thread thread_1(D);
    std::thread thread_2(E);
    thread_1.join();
    thread_2.join();
    F(); sleep(1);
}
main() {
    printf("in main\n");
    G();
    std::thread thread_1(A);
    std::thread thread_2(B);
    std::thread thread_3(C);
    thread_1.join();
    thread_2.join();
    thread_3.join();
    H();
}
// outputs:
in main
in G:0x600000060
in A:0x60005aa10
in B:0x60005ab10
in C:0x60005ae40
in D:0x60005af40
in E:0x60005b040
in F:0x60005ae40
in H:0x600000060
R71
  • 4,283
  • 7
  • 32
  • 60