When I try to run the following code, I get this error :
Traceback (most recent call last):
File "C:\temp\GPU Program Shell.py", line 28, in <module>
dev=mod.get_function("lol")
File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 285, in get_function
return self.module.get_function(name)
pycuda._driver.LogicError: cuModuleGetFunction failed: not found
Here's the code :
mod = SourceModule("""
extern "C" {
__device__ void lol(double *a)
{
a[0]=1;
}
__global__ void kernel(double *a)
{
const int r = blockIdx.x*blockDim.x + threadIdx.x;
a[r] = 1;
}
}
""")
max_length = 5
a = numpy.zeros(max_length)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
func = mod.get_function("kernel")
dev=mod.get_function("lol")
dev(a_gpu)
newa = numpy.empty_like(a)
cuda.memcpy_dtoh(newa, a_gpu)
print(newa)
print(a)
As you can probably see, this is a slight modification of the PyCUDA tutorial code. My intent is to call this device function which is going to launch kernels and integrate things and generally make my life easier. I did a bit of googling and I knew that I had to put "extern "c"" into my code because of name mangling and have had success with this before when I was just using PyCUDA to launch a kernel instead of a device function. Along the same lines, if I change my code to launch the kernel instead of the device function, it does what I want it to. What am I missing here?
Karsten
A little bit more looking into the Device Interface Reference documentation and it seems like the function get_function only deals with global functions? Did I interpret that correctly? If so, am I able to do what I'm trying to do?