1

I have tried below program using atomicInc().

__global__ void ker(int *count)
{
    int n=1;
    int x = atomicInc ((unsigned int *)&count[0],n);
    CUPRINTF("In kernel count is %d\n",count[0]);
}

int main()
{
    int hitCount[1];
    int *hitCount_d;

    hitCount[0]=1;
    cudaMalloc((void **)&hitCount_d,1*sizeof(int));

    cudaMemcpy(&hitCount_d[0],&hitCount[0],1*sizeof(int),cudaMemcpyHostToDevice);

    ker<<<1,4>>>(hitCount_d);

    cudaMemcpy(&hitCount[0],&hitCount_d[0],1*sizeof(int),cudaMemcpyDeviceToHost);

    printf("count is %d\n",hitCount[0]);
  return 0;
}

Output is:

In kernel count is 1
In kernel count is 1
In kernel count is 1
In kernel count is 1

count is 1

I'm not understanding why it is not incrementing. Can anyone help

einpoklum
  • 118,144
  • 57
  • 340
  • 684
Alvin
  • 940
  • 2
  • 13
  • 27
  • 2
    It looks like you really want to use atomicAdd, not atomicInc. – talonmies Aug 02 '13 at 05:10
  • @talonmies Thnak you. I have used atomicAdd() then it is working. But I have a doubt that In which scenario we can use atomicInc().? – Alvin Aug 02 '13 at 11:43

1 Answers1

10

Referring to the documentation, atomicInc does this:

for the following:

atomicInc ((unsigned int *)&count[0],n);

compute:

((count[0] >= n) ? 0 : (count[0]+1))

and store the result back in count[0]

(If you're not sure what the ? operator does, look here)

Since you've passed n = 1, and count[0] starts out at 1, atomicInc never actually increments the variable count[0] beyond 1.

If you want to see it increment beyond 1, pass a larger value for n.

The variable n actually acts as a "rollover value" for the incrementing process. When the variable to be incremented actually reaches the value of n, the next atomicInc will reset it to zero.

Although you haven't asked the question, you might ask, "Why do I never see a value of zero, if I am hitting the rollover value?"

To answer this, you must remember that all 4 of your threads are executing in lockstep. All 4 of them execute the atomicInc instruction before any execute the subsequent print statement.

Therefore we have a variable of count[0] which starts out at 1.

  1. The first thread to execute the atomic resets it to zero.
  2. The next thread increments it to 1.
  3. The third thread resets it to zero.
  4. The fourth and final thread increments it to 1.

Then all 4 threads print out the value.

As another experiment, try launching 5 threads instead of 4, see if you can predict what the value printed out will be.

ker<<<1,5>>>(hitCount_d);

As @talonmies indicated in the comments, if you swap your atomicInc for an atomicAdd:

int x = atomicAdd ((unsigned int *)&count[0],n);

You'll get results that you were probably expecting.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Is there any performance difference between `atomicInc(ptrToInt, INT_MAX)` and `atomicAdd(ptrToInt, 1)`? – Silicomancer Jan 28 '22 at 16:41
  • I'm not aware of any performance differences between different types of atomic instructions, nor am I aware of any formal documentation describing atomic throughput. It should be possible to write a fairly simple test case if this is a matter of concern. Unless you need the rollover capability, it's not obvious to me why you would use `atomicInc`. And if you did need it, `atomicAdd` is not a generally viable replacement. – Robert Crovella Jan 28 '22 at 16:47