Call device function from global function in pycuda

Question

I'm newbie to PyCUDA. I want to call function declared with __device__ from function declared with __global__. How can I do this in pyCUDA?

import pycuda.driver as cuda  
from pycuda.compiler import SourceModule  
import numpy as n  
import pycuda.autoinit  
import pycuda.gpuarray as gp

d=gp.zeros(shape=(128,128),dtype=n.int32)  
h=n.zeros(shape=(128,128),dtype=n.int32)  
mod=SourceModule("""  
      __global__ void  matAdd(int *a)  
    {  
            int px=blockIdx.x*blockDim.x+threadIdx.x;  
            int py=blockIdx.y*blockDim.y+threadIdx.y;         
            a[px*128+py]+=1;   
            matMul(px);

    }  
      __device__ void matMul( int px)
    {
      px=5;
    }  

""")

m=mod.get_function("matAdd")  
m(d,block=(32,32,1),grid=(4,4))  
d.get(h)

Above code is giving me following error

7-linux-i686.egg/pycuda/../include/pycuda kernel.cu]  
[stderr:  
kernel.cu(8): error: identifier "matMul" is undefined  

kernel.cu(12): warning: parameter "px" was set but never used  

1 error detected in the compilation of "/tmp/tmpxft_00002286_00000000-6_kernel.cpp1.ii".  
]

I am not sure I understand the question. In PyCUDA, you still write the device code in CUDA C. It is no different to if you wrote the host code in C++ rather than Python. So what is it you are asking? — talonmies, Aug 10 '12 at 13:29

score 1 · Accepted Answer · answered Aug 10 '12 at 15:47

1

You should declare your matMul function before refering to it. You could do it like this:

  __device__ void matMul( int px); // declaration
  __global__ void  matAdd(int *a)  
{  
        int px=blockIdx.x*blockDim.x+threadIdx.x;  
        int py=blockIdx.y*blockDim.y+threadIdx.y;         
        a[px*128+py]+=1;   
        matMul(px);

}  
  __device__ void matMul( int px) // implementation
{
  px=5; // by the way, this assignment does not propagate outside this function
}

, or just move whole matMul function to be before matAdd.

answered Aug 10 '12 at 15:47

aland

4,829
2
24
42

This is acceptable solution for this situation but what if matMul is defined in separate SourceModule class? The same error continues.. – username_4567 Aug 10 '12 at 16:01
[You need to have all the functions in one compilation unit](http://stackoverflow.com/a/5994345/929437), so there is no solution, except trying CUDA5.0, [which supports separate compilation and linkage](http://www.youtube.com/watch?v=0fWGCveQMHE&t=9m6s) – aland Aug 10 '12 at 16:08
I'm using CUDA 5, I know this fact but in PyCUDA how can we compile separate such functions? Because if I have too many functions then it'll become hard to manage in one object – username_4567 Aug 10 '12 at 16:20
Technically, you can use `pycuda.compiler.compile` to fine-tune the compilation of subunits, and then somehow link them, but I haven't managed to make it work yet... – aland Aug 10 '12 at 16:53
Yeah...even I thought that but I was not able to use pycuda.compiler.compile – username_4567 Aug 10 '12 at 16:55
Looks like currently it's impossible: pyCUDA expects nvcc to produce `.cubin` file and then uses cuModuleLoadDataEx, but there is no way to link two `.cubin`s into one `.cubin` file – aland Aug 10 '12 at 17:21

Call device function from global function in pycuda

1 Answers1