I've been working to create a GPU based conway's game of life program. If you're not familiar with it, here is the Wikipedia Page. I created one version that works by keeping an array of values where 0 represents a dead cell, and 1 a live one. The kernel then simply writes to an image buffer data array to draw an image based on the cell data and then checks each cell's neighbors to update the cell array for the next execution to render.
However, a faster method instead represents the value of a cell as a negative number if dead and a positive number if alive. The number of that cell represents the amount of neighbors it has plus one (making zero an impossible value since we cannot differentiate 0 from -0). However this means that when spawning or killing a cell we must update it's eight neighbor's values accordingly. Thus unlike the working procedure, which only has to read from the neighboring memory slots, this procedure must write to those slots. Doing so is inconsistent and the outputted array is not valid. For example cells contain numbers such as 14 which indicates 13 neighbors, an impossible value. The code is correct as I wrote the same procedure on the cpu and it works as expected. After testing, I believe that when tasks try to write to the memory at the same time there is a delay that leads to a writing error of some kind. For example, perhaps there is a delay between reading the array data and setting in which time the data is changed making another task's procedure incorrect. I've tried using semaphors and barriers, but have just learned OpenCL and parallel processing and don't quite grasp them completely yet. The kernel is as follows.
int wrap(int val, int limit){
int response = val;
if(response<0){response+=limit;}
if(response>=limit){response-=limit;}
return response;
}
__kernel void optimizedModel(
__global uint *output,
int sizeX, int sizeY,
__global uint *colorMap,
__global uint *newCellMap,
__global uint *historyBuffer
)
{
// the x and y coordinates that currently being computed
unsigned int x = get_global_id(0);
unsigned int y = get_global_id(1);
int cellValue = historyBuffer[sizeX*y+x];
int neighborCount = abs(cellValue)-1;
output[y*sizeX+x] = colorMap[cellValue > 0 ? 1 : 0];
if(cellValue > 0){// if alive
if(neighborCount < 2 || neighborCount > 3){
// kill
for(int i=-1; i<2; i++){
for(int j=-1; j<2; j++){
if(i!=0 || j!=0){
int wxc = wrap(x+i, sizeX);
int wyc = wrap(y+j, sizeY);
newCellMap[sizeX*wyc+wxc] -= newCellMap[sizeX*wyc+wxc] > 0 ? 1 : -1;
}
}
}
newCellMap[sizeX*y+x] *= -1;
// end kill
}
}else{
if(neighborCount==3){
// spawn
for(int i=-1; i<2; i++){
for(int j=-1; j<2; j++){
if(i!=0 || j!=0){
int wxc = wrap(x+i, sizeX);
int wyc = wrap(y+j, sizeY);
newCellMap[sizeX*wyc+wxc] += newCellMap[sizeX*wyc+wxc] > 0 ? 1 : -1;
}
}
}
newCellMap[sizeX*y+x] *= -1;
// end spawn
}
}
}
- The array output is the image buffer data used to render the kernel's computation.
- The sizeX and sizeY constants are the width and height of the image buffer respectively.
- The colorMap array contains the rgb integer values for black and white respectively which are used to change the image buffer's values properly to render colors.
- The newCellMap array is the updated cell map being calculated once rendering is determined.
- The historyBuffer is the old state of the cells at the beginning of the kernel call. Every time the kernel is executed, this array is updated to the newCellMap array.
Additionally the wrap function makes the space toroidal. How could I fix this code such that it works as expected. And why doesn't the global memory update with each change by a task? Isn't it supposed to be shared memory?