To execute a section of a code in parallel using a known number of thread, we usually do this:
#pragma omp parallel num_threads(8)
{}
However, how can we set number of cores instead of thread? Are these different?
To execute a section of a code in parallel using a known number of thread, we usually do this:
#pragma omp parallel num_threads(8)
{}
However, how can we set number of cores instead of thread? Are these different?
TL;DR: you cannot directly specify a number of cores in OpenMP preprocessing directives, but you can control how OpenMP threads are mapped on the available cores.
How it works:
Software threads can be dynamically created and destroyed at runtime by applications. They are mapped on hardware resources like hardware threads and cores that are fixed at runtime for a given platform. You cannot control cores directly (in user-space), only threads.
In OpenMP you can control the number of threads at runtime using several approaches:
num_threads
clause in preprocessing directivesOMP_NUM_THREADS
environment variableomp_set_num_threads
runtime functionOpenMP abstracts the hardware hierarchy using places. It defines a place as "an unordered set of implementation-defined hardware unit of a device on which one or more OpenMP threads can execute". In practice, places are usually a set of hardware threads on CPUs. Examples of valid place includes a given hardware socket, three specific cores or one specific hardware thread (multiple places can share the same hardware execution units). Places can be manually set using the OMP_PLACES
environment variable.
The mapping/binding of the OpenMP threads to places can be controlled using the environment variable OMP_PROC_BIND
, or more recently using the clause proc_bind
within preprocessing parallel directives. For example, you can force OpenMP threads to be bound to places, or to be uniformly spread among them.
Example:
If you want to use 4 cores, you can use the following environment:
OMP_PLACES="cores(4)"
OMP_PROC_BIND=close
The OpenMP runtime will arbitrarily select 4 cores of your hardware and execute the threads on it so that the first thread will run on the first core, the second thread on the second core, etc. If there are 8 threads, then each of the 4 core will execute two OpenMP threads (even if you have a processor with 8 cores).