I wonder something related kernels structure. May not the every line inside kernels work on GPU?
for example i have this code:
#pragma acc kernels copy(a[0:n],b[0:n])
{
#pragma acc loop
for (i = 0; i < n; i++)
a[i] = i+10;
a[1] = 10;
a[3] = 5;
#pragma acc loop
for (i = 0; i < n; i++)
b[i] = i+20;
}
Also Is the the situation same for acc parallel structure?