I work on the code from JCuda documentation. Currently, it's just adding vectors on GPU.
What should I do to reuse function add
on CPU (host)?
I know that, I have to change __global__
to __host__ __device__
but I have no idea how can I call it in my main function. I suspect that I have to use another nvcc option.
My goal is to run this same function on GPU and CPU and check execution time (I know how to check it).
.cu file (compiled with nvcc -ptx file.cu -o file.ptx
extern "C"
__global__ void add(int n, float *a, float *b, float *sum)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i<n)
{
sum[i] = a[i] + b[i];
}
}
fragment of main function
public static void main(String[] args) {
cuInit(0);
CUdevice device = new CUdevice();
cuDeviceGet(device, 0);
CUcontext context = new CUcontext();
cuCtxCreate(context, 0, device);
CUmodule module = new CUmodule();
cuModuleLoad(module, "kernels/JCudaVectorAdd.ptx");
CUfunction function = new CUfunction();
cuModuleGetFunction(function, module, "add");
...
Pointer kernelParameters = Pointer.to(
Pointer.to(new int[]{numElements}),
Pointer.to(deviceInputA),
Pointer.to(deviceInputB),
Pointer.to(deviceOutput)
);