Well, for example I have some array Y
and I want to increment Y[0]
in multiple threads.
If I only make Y[0]++
in __global__
function then Y[0]
will be 1.
So, how to resolve this?
Asked
Active
Viewed 107 times
0
-
1one approach would be to use [atomics](https://stackoverflow.com/questions/20726299/how-does-warp-work-with-atomic-operation/20726558#20726558). Another approach would be a [classical parallel reduction](https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf). This is a fairly basic concept, and so variants of this question have been asked many times here on the `cuda` tag. – Robert Crovella Nov 27 '18 at 19:56