Number of threads of Intel MKL functions inside OMP parallel regions

Question

I have a multithreaded code in C, using OpenMP and Intel MKL functions. I have the following code:

    omp_set_num_threads(nth);
#pragma omp parallel for private(l,s) schedule(static)
for(l=0;l<lines;l++)
{
    for(s=0;s<samples;s++)
    {
        out[l*samples+s]=mkl_ddot(&bands, &hi[s*bands+l], &inc_one, &hi_[s*bands+l], &inc_one);
    }
}//fin for l

I want to use all the cores of the multicore processor (the value of nth) in this pramga. But I want that each core computes a single mkl_ddot function independently (1 thread per mkl_ddot function).

I want to know how many threads are used by the mkl_ddot function in this case. I read in some forums, that by default mkl functions inside a pragma parallel run using only 1 cores (thats what i want). But I am not sure about this behaviour and I can not find the specific section in the manual explaining this situation.

Thanks in advance.

score 6 · Accepted Answer · edited May 23 '17 at 12:20

6

That's correct - by default MKL runs with a single thread if it detects that it is being called from inside a parallel region. I have explained the way to change this behaviour in this answer. You can simply invert the boolean parameters there to make sure that MKL would only use a single thread.

Yet, if you only want MKL functions to run single-threadedly, e.g. you only use it from inside parallel regions, you'd better link with the sequential MKL driver instead. With Intel's compiler this is easy - just specify -mkl=sequential. For other compilers you should look into the library's manual for how to link your program against the sequential driver.

edited May 23 '17 at 12:20

Community

1
1

answered Feb 05 '14 at 07:50

Hristo Iliev

72,659
12
135
186

do you know if this automatic 1-thread-usage calls inside an OMP parallel for region happens as well for the CRAY - BLAS library ? – velenos14 Apr 05 '23 at 11:43
@velenos14 it's MKL-specific. No idea about CRAY's library. Check the docs perhaps? – Hristo Iliev Apr 05 '23 at 19:00

score 2 · Answer 2 · answered Feb 04 '14 at 22:32

The Intel MKL Library uses OPENMP threading software for multithreading. The number of threads created will be based on the enviornment variable "OMP_NUM_THREADS". The default value for OMP_NUM_THREADS depends on the Intel MKL version and OPENMP libraries.

But in your case, you are doing a nested parallelism. But by default the nested parallelism is turn off. Hence the number of threads used by mkl_ddot function will be ONE (which means no parallelism at mkl_ddot function level).

You can enable the nested parallelism by invoking omp_set_nested(1). By this way, in your case, the nested parallelism will be enabled and more than one thread will be used by mkl_ddot function.

Number of threads of Intel MKL functions inside OMP parallel regions

2 Answers2

Linked