0

Take the following code fragment example:

    __global__ void my_kernel(float *d_min, uint32_t *d_argmin, float *d_input, uint32_t N)
    {
        uint32_t ii = blockDim.x * blockIdx.x + threadIdx.x;

        if (ii>=N)
            return;

        float cost_ii = someCostFunction( d_input[ii] );
        float old_val = atomicMin( d_min, cost_ii );
        if (old_val != cost_ii)
        {
            *d_argmin = ii;
        } 
    }

Lets assume d_min is initialized by the caller to say, 9999.9. And thread 0 computes a smaller cost of 100.0, and safely atomically adjusts the min. Meanwhile, in another threadblock, another cost_ii is computed that comes out to be cost_ii = 10.0. It blocks on the atomicMin instruction, but then sets it.

So the sequence of assignments to d_min looks like:

d_min = 9999.9  // from external initialization
d_min = 100.0   // from thread 0 in block 0
d_min = 10.0    // from thread 0 in block 1

But the subsequent if statement is executed in a different order:

d_argmin = 1024 // from thread 0 in block 1
d_argmin = 0    // from thread 0 in block 0

How do I avoid a race condition between the time the atomicMin() is executed, and the time d_argmin is set.

wrjohns
  • 484
  • 4
  • 14
  • 1
    like [this](https://stackoverflow.com/questions/17411493/how-can-i-implement-a-custom-atomic-function-involving-several-variables/17414007#17414007): your question is arguably a duplicate of that one, since both your min and argmin are 32-bit quantities. – Robert Crovella Jun 14 '19 at 04:13
  • The example I provided is not quite my actual problem. I've got a total of 3 32-bit quantities to atomically set (all floats). The method in the link looks limited to 64 bits total – wrjohns Jun 14 '19 at 04:16
  • Some other choices are a parallel reduction or a critical section. Parallel reduction is the preferred method (over critical section). Both are discussed here on the `cuda` tag. – Robert Crovella Jun 14 '19 at 04:38
  • I'm familiar with reductions. My data is along lines of "distance", "x-coord" and "y-coord". I'm thinking I may go with the 64 bit solution, and apply that logic twice (distance/x-coord, distance/y-coord) – wrjohns Jun 14 '19 at 04:44
  • 1
    Even apart from race conditions, your concept code here is broken in at least a few ways. First, there is no version of `atomicMin` that works on floating-point quantities. Second, the test for `old_val != cost_ii` does not tell you if the `atomicMin` operation was successful, so conditioning the write of argmin on that wouldn't work, even without any other thread interfering. – Robert Crovella Jun 14 '19 at 07:25
  • Using your provided link, I can get around the lack of float support using atomicCAS (which also has no float support). The union would let me do the comparison using a float, and the atomic operation do the equality comparison using the 64 bit int. As for the other comment (now irrelevant since I wouldnt be using atomicMin), atomicMin returns the old value. So if I did the comparison vs a new test value...ahh, if I changed to an inequality, I would see if the rest of it needed to be updated. – wrjohns Jun 14 '19 at 13:45

0 Answers0