Take the following code fragment example:
__global__ void my_kernel(float *d_min, uint32_t *d_argmin, float *d_input, uint32_t N)
{
uint32_t ii = blockDim.x * blockIdx.x + threadIdx.x;
if (ii>=N)
return;
float cost_ii = someCostFunction( d_input[ii] );
float old_val = atomicMin( d_min, cost_ii );
if (old_val != cost_ii)
{
*d_argmin = ii;
}
}
Lets assume d_min
is initialized by the caller to say, 9999.9
. And thread 0 computes a smaller cost of 100.0
, and safely atomically adjusts the min. Meanwhile, in another threadblock, another cost_ii
is computed that comes out to be cost_ii = 10.0
. It blocks on the atomicMin
instruction, but then sets it.
So the sequence of assignments to d_min
looks like:
d_min = 9999.9 // from external initialization
d_min = 100.0 // from thread 0 in block 0
d_min = 10.0 // from thread 0 in block 1
But the subsequent if
statement is executed in a different order:
d_argmin = 1024 // from thread 0 in block 1
d_argmin = 0 // from thread 0 in block 0
How do I avoid a race condition between the time the atomicMin()
is executed, and the time d_argmin
is set.