0

I have a kernel that is searching in different arrays (one thread per array), I need that always that one thread find a match, the result will be written in a global memory array. The problem is that how can access to this global array without write in the same location twice or left a location empty?

This is an pseudo code example of what Im trying to do:

__global__ void find(*TableOfArrays, *Result, position)
{
   int idx = blockIdx.x * blockDim.x + threadIdx.x;
   if (idx < numOfArrays)
   {
     for (int i = 0; i < tableOfArrays[idx].lenght; i++)
     {
        if (Match(tableOfArrays[idx][i]))
        {
            //The position variable gives me the position of the global array.  
            atomicAdd(&(position), (int)1);
            //I want to write each result in one space of the array Result
            Result[position] = tableOfArrays[idx][i];
        }   
     }
   }
}

The problem is that the threads are no accesing in an order to the Result array, and some threads take the same space... Any help?? thak you.

superpichon
  • 92
  • 1
  • 8
  • 1
    You are using `atomicAdd` incorrectly. The function returns a value, use it for array indexing, not a non-atomic read of `position` – talonmies Mar 13 '15 at 06:01
  • I tried that, but the same result. You mean this: Result[atomicAdd(&(position), (int)1)] = tableOfArrays[idx][i]; – superpichon Mar 13 '15 at 06:28
  • I missed that you are passing position by value - that won't work. Allocate it using cudaMalloc, or declare it as a `__device__` variable instead- – talonmies Mar 13 '15 at 06:30
  • Thats not the problem, this is only an example. Im passing the value as a pointer. – superpichon Mar 13 '15 at 07:08

1 Answers1

2

You must take the value of the variable when atomicAdd read the memory, after atomicAdd is executed another thread can access the memory and modify it.

int localIndex = atomicAdd(&(position), (int)1);
Result[localIndex] = tableOfArrays[idx][i];
talonmies
  • 70,661
  • 34
  • 192
  • 269
Melvon
  • 36
  • 3