I'm writing code that must run same kernel on GTX690 (dual chip) board. Since computation is pretty separable, I don't need data exchange between devices, I just merge the result on the CPU. I understand how to run code on each device, and also, how to provide i/o data space in memory space of each device.
I'm encountering problems when trying to set constants, which the kernel that runs on both devices uses. Do I need to make an array for each device? If so, how can I check, within the kernel, which device is running so I can access the proper constant from the array?
Here is some code that I am using. It works properly when i use 1 chip only (by setting numDev = 1
), but not on both.
__constant__ float d_cellSizeZ;
std::vector<int*> d_cell;
.................
bool Init(int cellsN_, float size_){
bool res = true;
if(cudaSuccess != cudaGetDeviceCount(&numDev))
return false;
//numDev = 1;
d_cl.resize(numDev);
for(int i = 0; i < numDev; ++i){
res &= (cudaSuccess == cudaSetDevice(i));
if(!res)
break;
res &= (cudaSuccess == cudaMalloc((void**)&d_cell[i], cellsN_*sizef(int)));
};
res &= (cudaSuccess == cudaMemcpyToSymbol(d_cellSizeZ, &size_, sizeof(float)));
if(!res)
Cleanup();
return res;
}
In the kernel I just use the d_cellSizeZ
constant. So, how should I make a constant for each device to be used nicely from the same kernel?
Another question is: if I try exchanging data between devices, will it go through the PCI bus or does there exist some internal path on dual-chip boards?