Has anyone tried the gpu_sync functions described in the article "Inter-Block GPU Communication via Fast Barrier Synchronization"? All the codes described seems pretty simple and easy to implement but it keeps freezing up my GPU. I'm sure I'm doing something stupid but I can't see what. Can anyone help me?
The strategy I'm using is the one described in the section “GPU Lock-Free Synchronization” and here is the OpenCL source code I've implemented:
static void globalSync(uint iGoalValue,
volatile __global int *globalSyncFlagsIN,
volatile __global int *globalSyncFlagsOUT)
{
const size_t iLocalThreadID = get_local_id(0);
const size_t iWorkGroupID = get_group_id(0);
const size_t iWorkGroupCount = get_num_groups(0);
//Only the first thread on each SM is used for synchronization
if (iLocalThreadID == 0)
{ globalSyncFlagsIN[iWorkGroupID] = iGoalValue; }
if (iWorkGroupID == 0)
{
if (iLocalThreadID < iWorkGroupCount)
{
while (globalSyncFlagsIN[iLocalThreadID] != iGoalValue) {
// Nothing to do here
}
}
barrier(CLK_GLOBAL_MEM_FENCE);
if (iLocalThreadID < iWorkGroupCount)
{ globalSyncFlagsOUT[iLocalThreadID] = iGoalValue; }
}
if (iLocalThreadID == 0)
{
while (globalSyncFlagsOUT[iWorkGroupID] != iGoalValue) {
// Nothing to do here
}
}
barrier(CLK_GLOBAL_MEM_FENCE);
}
Thanks in advance.