0

I want to use OpenMP to attain this effect: fix the number of threads, if there is an idle thread, dispatch the task into it, else wait for an idle one. The following is my test code:

#include <omp.h>
#include <stdio.h>
#include <unistd.h>

void func(void) {
    #pragma omp parallel for
    for (int i = 0; i < 3; i++)
    {
        sleep(30);
        printf("%d\n", omp_get_thread_num());
    }
}

int main(void) {
    omp_set_nested(1);
    omp_set_num_threads(omp_get_num_procs());
    #pragma omp parallel for
    for (int i = 0; i < 8; i++)
    {
        printf("%d\n", omp_get_thread_num());
        func();
    }   

    return 0;
}

Actually, my machine contains 24 cores, so

omp_set_num_threads(omp_get_num_procs())

will launch 24 threads at the beginning. Then main's for-loop will occupy 8 threads, during every thread, a func will be called, therefore additonal 2 threads ought to be used. From my calculation, I think 24 threads will be enough. But in actual running, there are totally 208 threads generated.

So my questions are as follows:
(1) Why so many threads are created though 24 seems enough?
(2) Is it possible to fix the number of threads (E.g., same as the number of cores) and dispatch the task when there is idle one?

Nan Xiao
  • 16,671
  • 18
  • 103
  • 164

3 Answers3

1

1) That's just the way parallel for is defined as a parallel directive immediately followed by a loop directive. So there is no limitation of thread creation based on worksharing granularity.

Edit: To clarify OpenMP will:

  1. Create an implementation-defined amount of threads - unless you specify otherwise
  2. Schedule the share of loop iterations among this team of threads. You now end up with threads in the team that have no work.
  3. If you have nested parallelism, this will repeat: A single thread encounters the new nested parallel construct and will create a whole new team.

So in your case 8 threads encounter the inner parallel construct spawning 24 new threads each, and 16 threads of the outer loop don't. So you have 8 * 24 + 16 = 208 threads total.

2) Yes, incidentally, this concept is called task in OpenMP. Here is a good introduction.

Zulan
  • 21,896
  • 6
  • 49
  • 109
  • "That's just the way `parallel for` is defined as a parallel directive immediately followed by a `loop` directive. So there is no limitation of thread creation based on worksharing granularity." . If possible, could you elaborate it? Thx! – Nan Xiao May 27 '17 at 03:08
  • @NanXiao I extended my answer. – Zulan May 27 '17 at 10:15
0

In OpenMP once you asked for particular number of threads the runtime system will give them to your parallel region if it is able to do so, and those threads cannot be used for other work while the parallel region is active. The runtime system cannot guess that you are not going to use threads you have requested.

So what you can do is to either ask for lesser number of threads if you need lesser threads, or use some other parallelization technique that can dynamically manage number of active threads. For example, using OpenMP if you ask for 8 threads for outer parallel and 3 threads for inner regions, you may and up with 24 threads (or lesser, if threads may be re-used, e.g. when parallel regions are not running simultaneously).

-- Andrey

0

you should try

#pragma omp task

besides, in my opinion, avoid using nested omp threads.

Clark Lee
  • 169
  • 6