2

I am trying to write a object oriented C++ code that is parallelized with OpenACC. I was able to find some stackoverflow questions and GTC talks on OpenACC, but I could not find some real world examples of object oriented code.

In this question an example for a OpenACCArray was shown that does some memory management in the background (code available at http://www.pgroup.com/lit/samples/gtc15_S5233.tar). However, I am wondering if it is possible create a class that manages the arrays on a higher level. E.g.

struct Data
{

//    OpenACCArray<float> a;

    OpenACCArray<Vector3<float>> a3;

    Data(size_t len) {
#pragma acc enter data copyin(this)
//        a.resize(len);
        a3.resize(len);
    }
    ~Data() {
#pragma acc exit data delete(this)
    }
    void update_device() {
//        a.update_device();
        a3.update_device();
    }
    void update_host() {
//        a.update_host();
        a3.update_host();
    }
};

int main(int argc, char *argv[])
{
    const size_t len = 32*128;
    Data d(len);

    d.update_device();
 #pragma acc kernels loop independent present(d)
    for (int i=0; i < len; ++i) {
     float val = (float)i/(float)len;

     d.a3[i].x = val;
     d.a3[i].y = i;
     d.a3[i].z = d.a3[i].x / d.a3[i].y;
    }
    d.update_host();
    for (int i=0; i < len/128; ++i) {
       cout << i << ": " << d.a3[i].x << "," << d.a3[i].y << "," << d.a3[i].z << endl;
    }
    cout << endl;
    return 0;
}

Interestingly this program works, but as soon as I uncomment OpenACCArray<float> a;, i.e. add another member to that Data struct, I get memory errors. FATAL ERROR: variable in data clause is partially present on the device.

Since the OpenACCArray struct is a flat structure that handles the pointer indirections on its own it should work to copy it as member? Or does need to be a pointer to the struct and the pointers have to be hardwired with directives? Then I fear the problem that I have to use alias pointers as suggested by jeff larkin at the above mentioned question. I don't mind doing the work to get this running, but I cannot find any reference how to do that. Using compiler directives keepgpu,keepptx helps a bit to understand what the compiler is doing, but I would prefer an alternative to reverse engineering generated ptx code.

Any pointers to helpful reference project or documents are highly appreciated.

Community
  • 1
  • 1
dwn
  • 413
  • 3
  • 12
  • Which version on OpenACCArray are you using from that example tarball? – jefflarkin Aug 23 '16 at 13:18
  • Also, can you please include your definition for Vector3? I've taken a guess, but would like to confirm we're building the same thing. – jefflarkin Aug 23 '16 at 13:32
  • @jefflarkin thank you for your help. I was using a custom class but, changed the code to use the float3 from your example 2 in the tarball. Here is a gist with the code https://gist.github.com/danielwinkler/12ab5b73221faca89d69d83d72c633b7 – dwn Aug 23 '16 at 14:22
  • `OpenACCArray` is also from example 2. – dwn Aug 23 '16 at 14:31
  • @jefflarkin could you reproduce the behavior with the provided code? – dwn Aug 24 '16 at 19:57
  • Yes, I was able to reproduce the behavior, but haven't had enough cycles to devise a solution yet. Maybe @mat-colgrove has already come across this, since the code comes from his examples. – jefflarkin Aug 24 '16 at 20:02
  • @jefflarkin thank you for the update. would you recommend to stay more with C and plain pointers until OpenACC v3 and deep copy semantics are standardized? – dwn Aug 24 '16 at 20:37

1 Answers1

1

In the OpenACCArray1.h header, remove the two "#pragma acc enter data create(this)" pragmas. What's happening is that the "Data" constructor is creating the "a" and "a3" objects on the device. Hence, when the second enter data region is encountered in the OpenACCArray constructor, the device this pointer is already there.

It works when there is only one data member since "a3" and "Data" share the same address for the this pointer. Hence when the second enter data pragma is encountered, the present check sees that it's already on the device so doesn't created it again. When "a" is added, the size of "Data" is twice that of "a", hence the present check sees that the this pointer is already there but has a different size than before. That's what the "partially present" error means. The data is there but has a different than expected size.

Only the parent class/struct should create the this pointer on the device.

Hope this helps, Mat

Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11
  • Thank you for the explanation, that clarifies the behavior. I will follow your recommendations. – dwn Aug 26 '16 at 06:48