PyCUDA Memory Addressing: Memory offset?

Question

I've got a large chunk of generated data (A[i,j,k]) on the device, but I only need one 'slice' of A[i,:,:], and in regular CUDA this could be easily accomplished with some pointer arithmetic.

Can the same thing be done within pycuda? i.e

cuda.memcpy_dtoh(h_iA,d_A+(i*stride))

Obviously this is completely wrong since theres no size information (unless inferred from the dest shape), but hopefully you get the idea?

score 2 · Accepted Answer · answered Apr 19 '11 at 19:57

The pyCUDA gpuArray class supports slicing of 1D arrays, but not higher dimensions that require a stride (although it is coming). You can, however, get access to the underlying pointer in a multidimensional gpuArray from the gpuarray member, which is a pycuda.driver.DeviceAllocation type, and the size information from the gpuArray.dtype.itemsize member. You can then do the same sort of pointer arithmetic you had in mind to get something that the driver memcpy functions will accept.

It isn't very pythonic, but it does work (or at least it did when I was doing a lot of pyCUDA + MPI hacking last year).

score 0 · Answer 2 · answered Apr 19 '11 at 18:53

0

Is unlikely that is implemented in PyCuda.

I can think to the following solutions:

Copy the entire Array A in memory and make a numpy array from the interested slice.
Create a Kernel that read the matrix and creates the desired slice.
Rearrange the Produced Data in a way that you can read a slice at a time from pointer arithmetic.

answered Apr 19 '11 at 18:53

fabrizioM

46,639
15
102
119

I went for option 1 anyway, but leaving the question open for a few hours to see if anyone else has a magical solution we haven't thought of. – Bolster Apr 19 '11 at 19:16
Yes I do that too, even for more than a weeks. Not everyone can read SO every day :) – fabrizioM Apr 19 '11 at 19:46

PyCUDA Memory Addressing: Memory offset?

2 Answers2