How to execute this same function on CPU and GPU with JCuda

Question

I work on the code from JCuda documentation. Currently, it's just adding vectors on GPU. What should I do to reuse function add on CPU (host)? I know that, I have to change __global__ to __host__ __device__ but I have no idea how can I call it in my main function. I suspect that I have to use another nvcc option.

My goal is to run this same function on GPU and CPU and check execution time (I know how to check it).

.cu file (compiled with nvcc -ptx file.cu -o file.ptx

extern "C"

__global__ void add(int n, float *a, float *b, float *sum)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i<n)
    {
        sum[i] = a[i] + b[i];
    }
}

fragment of main function

public static void main(String[] args) {
        cuInit(0);
        CUdevice device = new CUdevice();
        cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        cuCtxCreate(context, 0, device);

        CUmodule module = new CUmodule();
        cuModuleLoad(module, "kernels/JCudaVectorAdd.ptx");

        CUfunction function = new CUfunction();
        cuModuleGetFunction(function, module, "add");
        ...
        Pointer kernelParameters = Pointer.to(
                Pointer.to(new int[]{numElements}),
                Pointer.to(deviceInputA),
                Pointer.to(deviceInputB),
                Pointer.to(deviceOutput)
        );

talonmies · Accepted Answer · 2020-05-14T12:06:33.767

You can't and probably will never be able do this in JCUDA, because of the API interface it uses to interact with CUDA.

While CUDA can now "launch" a host function into a stream, that API isn't exposed by JCUDA at present, and it wouldn't work the way that device code does now (this restriction would apply to PyCUDA and other driver API based frameworks as well).

You would likely need use JNI or some other way to retrieve the host function from a library and call it that way.

How to execute this same function on CPU and GPU with JCuda

1 Answers1