2

I am currently studying the openacc API, and I was wondering if it is possible to create an array on the device without having any corresponding allocate array on the Host.

Let's say that I want to use my old cuda kernel, and only handle memory management through the openacc API. I need some arrays of 256 elements used only on the device. If I only declare my pointers on the host without allocation, they might have sequential address.

If I used a present_or_create clause on these pointers, with my size of 256 elements, will I end with distinct arrays on the device? Or the consecutive addresses on the host, coupled with the length of my arrays, will be considered as being part of the same array?

Here is an example: address of pointer A is 0,address of pointer B is 4.

If I do two pcreate on A[0:256] and B[0:256], since the range of data on the host will be [0 , 1024] and [4 , 1028], will I end up on the device with two distinct arrays of 256 elements, or will I end up with only one array with range [0 , 1028]?

Do I have to first allocate my two arrays on the host to be sure to have two distinct arrays, or should this method work fine?

Ivan Ferić
  • 4,725
  • 11
  • 37
  • 47
chabachull
  • 70
  • 5

1 Answers1

2

I can really only talk to the PGI implementation, but I think Cray's works similarly. The create/copy/present data clauses key on the address of the host data to determine whether the data is already present on the device. If you have a pointer A and a pointer B that happen to have the same values (both point to the same space), then pcreate(A[0:256],B[0:256]) will create the data for A, then the present_or_ test for B will see that the data is already present. If A[0] through A[255] on the host overlap with B[0] through B[255], the runtime will see that overlap as well. It's not the starting address that matters, it's the whole range. The model is to create data on the device that's a mirror of the same data on the host, and the "key" for the "present table" lookup is the host address range.

In your specific case, if you have pointer A with value 0, well, that's a NULL pointer and treated differently. So if you have pointer A with value 4 and B with value 8 and do pcreate(A[0:256],B[0:256]), you will get [4:256] copied for A, then the runtime will notice that you are trying to move a range of B that overlaps but is not contained within already existing space. That's not allowed in the spec and not supported by our compiler. Supporting that would require reallocating the data for A on the device, which might mean that the device address would move. Since those addresses can be captured, and the old stale address would no longer work, it's an unsafe thing to do.

  • Thank you for your insight. We understand that such overlapping is not allowed in the spec, but it was unclear if this case was considered as an overlap or not for the spec. – chabachull Mar 25 '13 at 08:24