----------------a.c---------------------
variable *XX;
func1(){
for(...){
for(i = 0; i < 4; i++)
cutStartThread(func2,args)
}
}
---------------b.cu-------------------
func2(args){
cudaSetDevice(i);
xx = cudaMalloc();
mykernel<<<...>>>(xx);
}
--------------------------------------
Recently, I want to use multiple GPU device for my program. There are four Tesla C2075 cards on my node. I use four threads to manage the four GPUs. What's more, the kernel in each thread is launched several times. A simple pseudo code as above. I have two questions:
Variable
XX
is a very long string, and is read only in the kernel. I want to preserve it during the multiple launches ofmykernel
. Is it ok to callcudaMalloc
and pass the pointer tomykernel
only whenmykernel
is first launched? Or should I use__device__
qualifier?XX
is used in four threads, so I declare it as a global variable in filea.c
. Are multiplecudaMalloc
ofXX
correct or should I use an array such asvariable *xx[4]
?