Reading some tutorials from OpenMP 4, I found that target
regions can participate in the same dependency graph of CPU tasks, using the depend
clause.
When programming OpenMP tasks, we know they can be run concurrently. But is this possible on GPUs? Can a GPU run multiple target
regions simultaneously?
I tried with this code:
#include <omp.h>
#include <stdio.h>
int main() {
int i;
#pragma omp parallel
#pragma omp single
{
#pragma omp task private(i)
#pragma omp target
{
for (i = 0; i < 100; i++)
printf("1 %d\n", i);
}
#pragma omp task private(i)
#pragma omp target
{
for (i = 0; i < 100; i++)
printf("2 %d\n", i);
}
#pragma omp task private(i)
#pragma omp target
{
for (i = 0; i < 100; i++)
printf("3 %d\n", i);
}
}
#pragma omp taskwait
}
Although the task
s are executed in arbitrary order, the target
regions are executed atomically, one region at a time.