First of all, the most general way to express your code is:
#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < jobs; ++i)
{
}
Assume that the Implementation has a good default.
Before you go any further, measure. Sure sometimes it can be necessary to help out the implementation, but don't do that blindly. Most of the further things are implementation dependent, so looking at the standard doesn't help you a lot.
If you still manually specify the number of threads, you might as well give it std::max(N, jobs)
.
Here are some things to look out that could influence the performance in your case:
- Don't worry too much about overhead of spawning unnecessary threads. Implementations mitigate that by thread pools. That doesn't mean it's always perfect - so measure.
- Do not oversubscribe unless you know what your are doing. Use at most number of cores threads. This is a general advice.
- The
OMP_WAIT_POLICY
matters in your case as it defines how waiting threads behave. In your case excess threads will wait at the implicit barrier at the end of the parallel region. Implementations are free to do what they want with the setting, but you may assume that with active
, threads use some form of busy waiting and with passive
, threads will sleep. A busy waiting thread could use resources of the computing threads, e.g. power budget that could use used to increase turbo frequency of the computing threads. Also they waste energy. In case of oversubscription the impact of active threads is much worse.