According to the OpenACC documentation:
copyin
- Create space for the listed variables on the device, initialize the variable by copying data to the device at the beginning of the region, and release the space on the device when done without copying the data back the the host.
I've created a test example program
int main(int argc, char** argv)
{
int teste[] = { -15 };
#pragma acc data copyin(teste[0:1])
{
#pragma acc parallel loop
for (int p = 0; p < 5000; p++) {
teste[0] = p;
}
}
printf("%d", teste[0]);
return 0;
}
According to the Docs the program should output -15
since the data is modified on the device and the result is not copied back to the host. But once I compile and run this code, the output is 4999
My compiler is gcc (tdm64-1) 10.3.0
and I'm running the program at a computer with separate device and host memory
I'd like to know why is this not working, and what could I do to prevent the copy from the device back to the host.
Here's the program running using git bash on windows:
$ cat test.c && echo "" &&gcc -fopenacc test.c && ./a.exe
#include <stdio.h>
int main(int argc, char** argv)
{
int teste[] = { -15 };
#pragma acc data copyin(teste[0:1])
{
#pragma acc parallel loop
for (int p = 0; p < 5000; p++) {
teste[0] = p;
}
}
printf("%d\n", teste[0]);
return 0;
}
4999
I also got access to a Linux Machine, and even using nvc I could not get the correct results
cat test.c && echo "" && /opt/nvidia/hpc_sdk/Linux_x86_64/2021/compilers/bin/nvc -acc -Minfo=accel test.c && ./a.out
#include <stdio.h>
int main(int argc, char** argv)
{
int teste[] = { -15 };
#pragma acc data copyin(teste[0:1])
{
#pragma acc parallel loop
for (int p = 0; p < 5000; p++) {
teste[0] = p;
}
}
printf("%d\n", teste[0]);
return 0;
}
main:
9, Generating copyin(teste[:]) [if not already present]
Generating NVIDIA GPU code
12, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
4999