0

Well, for example I have some array Y and I want to increment Y[0] in multiple threads. If I only make Y[0]++ in __global__ function then Y[0] will be 1. So, how to resolve this?

talonmies
  • 70,661
  • 34
  • 192
  • 269
J. Toming
  • 81
  • 7
  • 1
    one approach would be to use [atomics](https://stackoverflow.com/questions/20726299/how-does-warp-work-with-atomic-operation/20726558#20726558). Another approach would be a [classical parallel reduction](https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf). This is a fairly basic concept, and so variants of this question have been asked many times here on the `cuda` tag. – Robert Crovella Nov 27 '18 at 19:56

1 Answers1

3

Atomic operations are implementation dependent. If this compiles with no warnings, it is likely to work, but should be tested :-), or at least examine the assembler.

__global__ void mykernel(int *value){
    int my_old_val = atomicAdd(value, 1);
}

See the guide here

Gardener
  • 2,591
  • 1
  • 13
  • 22