2

I have a 2D matrix SIZE x SIZE, which I'm trying to copy to the GPU.

I allocate the matrix this way:

#define SIZE 1024
float (*a)(SIZE) = (float(*)[SIZE]) malloc(SIZE * SIZE * sizeof(float));

And I have this on my ACC region:

void mmul_acc(restrict float a[][SIZE],
              restrict float b[][SIZE],
              restrict float c[][SIZE]) {
#pragma acc data copyin(a[0:SIZE][0:SIZE], b[0:SIZE][0:SIZE]) \
    copyout c[0:SIZE][0:SIZE])
{
  ... code here...
}

When compiling with the PGI compiler, using -Minfo=acc, the compiler tells me:

Generating copyin(a[0:1024][0:])

What does a[0:1024][0:] mean? Why not a[0:1024][0:1024] ???

If instead of declaring matrices I declare arrays with size SIZE*SIZE, doing

#pragma acc copyin(a[0:SIZE*SIZE])

Generates the following compiler message

Generating copyin(a[0:16777216])

The code actually works the same way, same performance, same result.

Apparently in both ways the compiler generates the same code, as it should be, but the message is not straightforward.

I'm using the PGI accelerator 12.8, in a Linux64 machine. I'm compiling with -Minfo=acc

Note: this question was edited and now it doesn't really make much sense, but maybe it can useful to more people.

leo
  • 1,117
  • 1
  • 8
  • 18
  • I assume you are using PGI OpenACC compiler. Are you compiling with the -Minfo switch? In C, when passing a doubly-subscripted array as a parameter to a function, it's necessary to identify the range of the first subscript. So the "Generating copyin" message is the just the compiler confirming that it is creating a host->device copy on the doubly-subscripted array a with a first subscript range of 1024. Presumably SIZE is set to 1024. To answer the second question about why the code runs 10x slower, it would be useful to see the full compiler output with the -Minfo switch, in both cases. – Robert Crovella Oct 14 '12 at 23:33
  • Also, is `float (*a)(SIZE)` syntactically valid? why don't you just do `float **a = (float**)malloc(SIZE * SIZE * sizeof(float));`? – harrism Oct 14 '12 at 23:45
  • Robert: yes, I am. I just edited the question to add more information. About the 10x slower, I had made a mistake, and I've corrected the question. Thanks – leo Oct 15 '12 at 00:04
  • harrism: yes, that tells the compiler to allocate a 1-dimensional array, but interpret it as a matrix (basically you have the benefit of accessing it with two indices, but it's guaranteed to be contiguously allocated. Thanks! – leo Oct 15 '12 at 00:05
  • I should add to this post that the compiler message `Generating copyin(a[0:1024][0:])` is fixed in PGI accelerator 12.9.0 and gives `Generating copyin(a[0:1024][0:1024])` – lashgar Oct 15 '12 at 13:58
  • So is the problem fixed by 12.9.0? If so, please post that as an answer @ahmad. – harrism Oct 18 '12 at 00:26
  • @harrism OK. I've posted the answer. – lashgar Oct 18 '12 at 08:20

1 Answers1

2

This issue is fixed in latest PGI Compiler 12.9.0. The compiler now returns following messsage:

Generating copyin(a[0:1024][0:1024])
lashgar
  • 5,184
  • 3
  • 37
  • 45