4

I have problems passing the right parameters to the prepare function (and to the prepared_call) to allocate of shared memory in PyCUDA. I understand the error message in this way, that one of the variables I pass to PyCUDA is a long instead of what I intended float32. But I cannot see, where the variable comes from.

Furthermore does it seem to me, that the official example and the documentation of prepare contradict each other regarding if block needs to be None or not.

from pycuda import driver, compiler, gpuarray, tools
import pycuda.autoinit
import numpy as np

kernel_code ="""
__device__ void loadVector(float *target, float* source, int dimensions )
{
    for( int i = 0; i < dimensions; i++ ) target[i] = source[i];
}
__global__ void kernel(float* data, int dimensions, float* debug)
{
    extern __shared__ float mean[];
    if(threadIdx.x == 0) loadVector( mean, &data[0], dimensions );
    debug[threadIdx.x]=  mean[threadIdx.x];
}
"""

dimensions = 12
np.random.seed(23)
data = np.random.randn(dimensions).astype(np.float32)
data_gpu = gpuarray.to_gpu(data)
debug = gpuarray.zeros(dimensions, dtype=np.float32)

mod = compiler.SourceModule(kernel_code)
kernel = mod.get_function("kernel")
kernel.prepare("PiP",block = (dimensions, 1, 1),shared=data.size)
grid = (1,1)
kernel.prepared_call(grid,data_gpu,dimensions,debug)
print debug.get()

Output

Traceback (most recent call last):
File "shared_memory_minimal_example.py", line 28, in <module>
kernel.prepared_call(grid,data_gpu,dimensions,debug)
File "/usr/local/lib/python2.6/dist-packages/pycuda-0.94.2-py2.6-linux-x86_64.egg/pycuda/driver.py", line 230, in function_prepared_call
func.param_setv(0, pack(func.arg_format, *args))
pycuda._pvt_struct.error: cannot convert argument to long
Community
  • 1
  • 1
Framester
  • 33,341
  • 51
  • 130
  • 192
  • possible duplicate of [Create arrays in shared memory w/o templates like in PyOpenCL](http://stackoverflow.com/questions/6468132/create-arrays-in-shared-memory-w-o-templates-like-in-pyopencl) – talonmies Aug 05 '11 at 10:10
  • I gave you an answer which explains how to do this is [another question](http://stackoverflow.com/questions/6468132/create-arrays-in-shared-memory-w-o-templates-like-in-pyopencl/6491754#6491754) you posted about a month ago. – talonmies Aug 05 '11 at 10:11
  • Please update the **question** to include the new code and information – talonmies Aug 05 '11 at 12:26
  • Hi Talonmies, I updated the whole question. Thanks for your patients. – Framester Aug 05 '11 at 12:52

1 Answers1

6

I came across this same problem and it took my a while to work out the answer so here goes. The cause of the error message is that data_gpu is a GPUArray instance, i.e. you made it with

data_gpu = gpuarray.to_gpu(data)

To pass it to prepared_call you need to do data_gpu.gpudata to get the associated DeviceAllocation instance (i.e. effectively the pointer to the device memory location).

Also, passing a block argument to prepare is now deprecated - so a correct invocation would be something like this:

data_gpu = gpuarray.to_gpu(data)
func.prepare( "P" )
grid = (1,1)
block = (1,1,1)
func.prepared_call( grid, block, data_gpu.gpudata )
James Thorniley
  • 126
  • 1
  • 5