Take the following code, which illustrates the calling of a simple routine on the accelerator, compiled on the device using OpenACC 2.0's routine
directive:
#include <iostream>
#pragma acc routine
int function(int *ARRAY,int multiplier){
int sum=0;
#pragma acc loop reduction(+:sum)
for(int i=0; i<10; ++i){
sum+=multiplier*ARRAY[i];
}
return sum;
}
int main(){
int *ARRAY = new int[10];
int multiplier = 5;
int out;
for(int i=0; i<10; i++){
ARRAY[i] = 1;
}
#pragma acc enter data create(out) copyin(ARRAY[0:10],multiplier)
#pragma acc parallel present(out,ARRAY[0:10],multiplier)
if (function(ARRAY,multiplier) == 50){
out = 1;
}else{
out = 0;
}
#pragma acc exit data copyout(out) delete(ARRAY[0:10],multiplier)
std::cout << out << std::endl;
}
How does function
know to use the device copies of ARRAY[0:10]
and multiplier
when it is called from within a parallel region? How can we enforce the use of the device copies?