0

Here is my question about openacc. I read the APIs (v1 and v2), and the behavior of nested data environment with different subparts of the same array is unclear to me.

Code example:

#pragma acc data pcopyin(a[0:20])
{
  #pragma acc data pcopyin(a[100:20])
  {
    #pragma acc parallel loop
    for(i=0; i<20; i++)
      a[i] = i;
      a[i+100] = i;
  }
}

My understanding is that this should work (or at leaste the two acc data parts):

  • The first pragma checks if a[0,20] is on the accelerator
  • NO -> data are allocated on the device and transferred
  • The second pragma checks if a[100,120] is on the accelerator
  • The pointer a is on the accelerator, but not the data from a[100,120]
  • The data are allocated on the device and transferred

I tried this kind of thing with CAPS compiler (v3.3.0 which is the only available right now on my test machine), and the second pragma acc data returns me an error (my second subarray don't have the correct shape). So what happens with my test (I suppose) is that the pointer "a" was found on the accelerator, but the shape associated with it ([0:20]) is not the same in my second pragma ([100:20]).

Is this the normal behavior planned in the API, or should my example work?

Moreover, if this is supposed to work, is there some sort of coherence between the subparts of the same array (somehow, they will be positionned like on the host and I will be able to put a[i] += a[100+i] in my kernel)?

chabachull
  • 70
  • 5

1 Answers1

1

The present test will be looking if "a" is on the device. Hence, when the second data region is encountered, "a" is already on the device but only partially. Instead, a better method would be to add a pointer to point into "a" and reference this pointer on the device. Something like:

#include <stdio.h>

int main () {

   int a[200];
   int *b;
   int i;
   for(i=0; i<200; i++) a[i] = 0;
   b=a+100;

#pragma acc data pcopy(a[0:20])
{
  #pragma acc data pcopy(b[0:20])
  {
    #pragma acc parallel loop
    for(i=0; i<20; i++) {
      a[i] = i;
      b[i] = i;
    }
  }
}
   for(i=0; i<22; i++) printf("%d = %d \n", i, a[i]);
   for(i=100; i<122; i++) printf("%d = %d \n", i, a[i]);
  return 0;
 }

If you had just copied "a[100:20]", then accessing outside this range would be considered a programmer error.

Hope this helps, Mat

Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11
  • Ok, I tested your method yesterday and it works fine to have the correct result. But I was wondering if my example was supposed to work according to the API. Is the observed behavior the correct one according to the API, or is it just an implmentation limitation in CAPS compiler for openacc? – chabachull Jan 17 '14 at 08:02
  • The code should work as far as the API is concerned. However, each compiler may implement this in a different way. The only guarantee from the spec is that both subarrays will be available when the kernel runs on the accelerator. – Ruyk Jan 21 '14 at 10:54