From the OpenMP standard one can read:
When a thread encounters a parallel construct, a team of threads is
created to execute the parallel region. The thread that encountered
the parallel construct becomes the master thread of the new team, with
a thread number of zero for the duration of the new parallel region.
All threads in the new team, including the master thread, execute the
region. Once the team is created, the number of threads in the team
remains constant for the duration of that parallel region.
Consequently, with the clause #pragma omp parallel for num_threads
all threads will be performing the parallel work (i.e., computing the iterations of the loop), which is something that you do not want. To get around this, you can implement part of the functionality of
`#pragma omp parallel for num_threads`
since, explicitly using the aforementioned clause will make the compiler automatically divide the iterations of the loop among the threads in the team, including the master thread of that team. The code would look the following:
# pragma omp parallel num_threads(thread_count) shared(a, n, number)
{
int thread_id = omp_get_thread_num();
int total_threads = omp_get_num_threads();
if(thread_id != 0) // all threads but the master thread
{
thread_id--; // shift all the ids
total_threads = total_threads - 1;
for(long i = thread_id ; i < n; i += total_threads) {
// do a task, such as:
a[i] = a[i] * number;
}
}
}
First, we ensure that all threads except the master (i.e., if(thread_id != 0)
) execute the loop to be parallelized, then we divided the iterations of the loop among the remaining threads (i.e.,
for(int i = thread_id ; i < n; i += total_threads)
). I have chosen a static distribution of chunk=1, you can choose a different one, but you will have to adapt the loop accordingly.
Now you just need to add the logic to:
Now, what would be a way to check from the master thread whether the
other threads are still running, and do some incrementing if they are?
So that I do not give away too much I will add the pseudocode that you will have to covert to real code to make it work:
// declare two shared variable
// 1) to count the number of threads that have finished working count_thread_finished
# pragma omp parallel num_threads(thread_count) shared(a, n, number)
{
int thread_id = omp_get_thread_num();
int total_threads = omp_get_num_threads();
if(thread_id != 0) // all threads but the master thread
{
thread_id--; // shift all the ids
total_threads = total_threads - 1;
for(long i = thread_id ; i < n; i += total_threads) {
// do a task, such as:
a[i] = a[i] * number;
}
// count_thread_finished++
}
else{ // the master thread
while(count_thread_finished != total_threads -1){
// wait for a while....
}
}
}
Bear in mind, however, that since the variable count_thread_finished
is shared among threads, you will need to ensure mutual exclusion (e.g., using omp atomic) on its updates, otherwise you will have a race-condition. This should give you enough to keep going.
Btw: schedule(static, n/thread_count)
is mostly not needed since by default most OpenMP implementations already divide the iterations of the loop (among the threads) as continuous chunks.