I am trying to write a object oriented C++ code that is parallelized with OpenACC. I was able to find some stackoverflow questions and GTC talks on OpenACC, but I could not find some real world examples of object oriented code.
In this question an example for a OpenACCArray
was shown that does some memory management in the background (code available at http://www.pgroup.com/lit/samples/gtc15_S5233.tar).
However, I am wondering if it is possible create a class that manages the arrays on a higher level. E.g.
struct Data
{
// OpenACCArray<float> a;
OpenACCArray<Vector3<float>> a3;
Data(size_t len) {
#pragma acc enter data copyin(this)
// a.resize(len);
a3.resize(len);
}
~Data() {
#pragma acc exit data delete(this)
}
void update_device() {
// a.update_device();
a3.update_device();
}
void update_host() {
// a.update_host();
a3.update_host();
}
};
int main(int argc, char *argv[])
{
const size_t len = 32*128;
Data d(len);
d.update_device();
#pragma acc kernels loop independent present(d)
for (int i=0; i < len; ++i) {
float val = (float)i/(float)len;
d.a3[i].x = val;
d.a3[i].y = i;
d.a3[i].z = d.a3[i].x / d.a3[i].y;
}
d.update_host();
for (int i=0; i < len/128; ++i) {
cout << i << ": " << d.a3[i].x << "," << d.a3[i].y << "," << d.a3[i].z << endl;
}
cout << endl;
return 0;
}
Interestingly this program works, but as soon as I uncomment OpenACCArray<float> a;
, i.e. add another member to that Data struct, I get memory errors.
FATAL ERROR: variable in data clause is partially present on the device
.
Since the OpenACCArray
struct is a flat structure that handles the pointer indirections on its own it should work to copy it as member?
Or does need to be a pointer to the struct and the pointers have to be hardwired with directives?
Then I fear the problem that I have to use alias pointers as suggested by jeff larkin at the above mentioned question.
I don't mind doing the work to get this running, but I cannot find any reference how to do that.
Using compiler directives keepgpu,keepptx
helps a bit to understand what the compiler is doing, but I would prefer an alternative to reverse engineering generated ptx code.
Any pointers to helpful reference project or documents are highly appreciated.