What I'm trying to do is spawn N
tasks at once, by recursively dividing the iteration space with the help of tasks, in order to spawn the 'real' tasks quicker.
I can do this linearly with a loop, like so:
for (int i = 0; i < N; i+=bx)
#pragma omp task firstprivate(i)
task_work(i);
Below is what I've got so far in my recursive version.
void rec_spawn(int start, int end, int cz)
{
// If the iteration space lower than the chunksize 'cz', spawn the tasks
// linearly. Otherwise divide the iteration space in two, and call this
// function recursively.
if (end - start <= cz)
{
for (int ii=start_outer; ii < end_outer; ii+=bx)
#pragma omp task firstprivate(ii)
task_work(ii)
}
else
{
// first half
#pragma omp task firstprivate(start, end, cz)
rec_spawn(start, start + ((end - start) / 2), cz);
// second half
#pragma omp task firstprivate(start, end, cz)
rec_spawn(start + ((end - start) / 2), end, cz);
}
#pragma omp taskwait
}
This version is much slower, which it shouldn't be, and I suspect it is due to the #pragma omp taskwait
. I want to be able to do something similarly but without the taskwait, however when I try to remove it the code segfaults.
When i try to debug the program all information I can collect is:
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004057de in gomp_barrier_handle_tasks (state=<optimized out>) at ../../../gcc-4.9.0/libgomp/task.c:715
715 ../../../gcc-4.9.0/libgomp/task.c: No such file or directory.
(gdb) bt
#0 0x00000000004057de in gomp_barrier_handle_tasks (state=<optimized out>) at ../../../gcc-4.9.0/libgomp/task.c:715
#1 0x0000000000409518 in gomp_team_barrier_wait_end (bar=0x11a2874, state=0) at ../../../gcc-4.9.0/libgomp/config/linux/bar.c:94
#2 0x0000000000401d24 in main._omp_fn.0 () at src/heat-omp-rec.c:86
#3 0x000000000040705e in gomp_barrier_init (count=<optimized out>, bar=<optimized out>) at ../../../gcc-4.9.0/libgomp/config/linux/bar.h:59
#4 gomp_new_team (nthreads=4201738) at ../../../gcc-4.9.0/libgomp/team.c:166
#5 0x00007fff7d208d50 in ?? ()
#6 0x00007f7080cb69c0 in ?? ()
#7 0x0000000000000000 in ?? ()
So my question is, why is the taskwait required here (it is not required by the actual work inside task_work()), and how can the recursive spawning be rewritten to not use it.