MKL behavior is different and as matter of fact you can have more threads than there are cores.
The reason @Kristoffer doesn't see this in his answer, is because the dynamic adjustment is enabled per default:
By default, Intel® MKL can adjust the specified number of threads
dynamically. [...] If dynamic adjustment of the number of threads
is disabled, Intel® MKL attempts to use the specified number of
threads in internal parallel regions (for more information, see
theIntel® MKL Developer Guide). Use the mkl_set_dynamic function to
control dynamic adjustment of the number of threads.
So if we use mkl_set_dynamic(0)
to switch the dynamic adjustment off, we will see the following:
>>> set_max_threads(44)
>>> get_max_threads()
6
>>> mkl_set_dynamic(0)
>>> get_max_threads()
44
So we see, that without dynamic adjustment MKL could use 44 threads. Whether this is really the case is another question, the help to mkl_get_dynamic
explains (even if the information seems to be a little bit outdated to me as get_max_threads
already is taken into consideration in get_max_threads
):
Suppose that the mkl_get_max_threads
function returns the number of threads
equal to N. [...] If dynamic
adjustment is disabled, Intel ® MKL requests exactly N threads for
internal parallel regions ([...]). However, the
OpenMP* run-time library may be configured to supply fewer threads
than Intel ® MKL requests, depending on the OpenMP* setting of dynamic
adjustment.
OpenMP's method is given in Algorithm 2.1 OpenMP-5.0 specification (which I don't pretend to understand).
On my machine the important values are omp_get_thread_limit()=2147483647
and omp_get_dynamic()=0
, and so disabling MKL_DYNAMIC
and setting maximal thread-number higher I really can see descrease of performance due to more overhead.