Calling device function using PyCUDA returns error

Question

When I try to run the following code, I get this error :

Traceback (most recent call last):
  File "C:\temp\GPU Program Shell.py", line 28, in <module>
    dev=mod.get_function("lol")
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 285, in get_function
    return self.module.get_function(name)
pycuda._driver.LogicError: cuModuleGetFunction failed: not found

Here's the code :

mod = SourceModule("""

extern "C" {
__device__ void lol(double *a)
{
    a[0]=1;
}


__global__ void kernel(double *a)
{
    const int r = blockIdx.x*blockDim.x + threadIdx.x;
    a[r] = 1;
}
}
""")

max_length = 5
a = numpy.zeros(max_length)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
func = mod.get_function("kernel")
dev=mod.get_function("lol")
dev(a_gpu)
newa = numpy.empty_like(a)
cuda.memcpy_dtoh(newa, a_gpu)

print(newa)
print(a)

As you can probably see, this is a slight modification of the PyCUDA tutorial code. My intent is to call this device function which is going to launch kernels and integrate things and generally make my life easier. I did a bit of googling and I knew that I had to put "extern "c"" into my code because of name mangling and have had success with this before when I was just using PyCUDA to launch a kernel instead of a device function. Along the same lines, if I change my code to launch the kernel instead of the device function, it does what I want it to. What am I missing here?

Karsten

A little bit more looking into the Device Interface Reference documentation and it seems like the function get_function only deals with global functions? Did I interpret that correctly? If so, am I able to do what I'm trying to do?

score 1 · Accepted Answer · answered Oct 15 '13 at 04:36

1

You cannot call a __device__ function from host code. If you're indicating that the PyCUDA tutorial code shows how to do this, I'd like to see that tutorial.

It's not clear to me what you're trying to accomplish with calling the __device__ function from host code that could not be done with an ordinary kernel (__global__) launch.

answered Oct 15 '13 at 04:36

Robert Crovella

143,785
11
213
257

Right Robert, I realized it was a dumb question not too long after I asked it. What I'm trying to do is write an integrator that runs solely on the gpu after I send it initial conditions and parameters. I wanted the device function (which I'm now just using a single thread kernel for) to run the integrator and to be able to call other kernels to perform the integration. – Hair of Slytherin Oct 15 '13 at 14:40
You can launch a `__global__` kernel with just a single block of a single thread, which should behave the same as the `__device__` function that you have in mind. Not trying to comment on the overall idea, just the mechanics. – Robert Crovella Oct 15 '13 at 14:45
Yup, that's what I'm trying to do right now. Do you think that's a bad idea? – Hair of Slytherin Oct 15 '13 at 16:54
Normally launching a kernel of one block and one thread is not how you get performance out of a GPU. But your description ("My intent is to call this device function which is going to launch kernels and integrate things ") makes it sound like a master thread, of sorts, which might be OK. There's not enough information in this question to comment more specifically about it. – Robert Crovella Oct 15 '13 at 18:20

Calling __device__ function using PyCUDA returns error

1 Answers1

Calling device function using PyCUDA returns error