0

I have a method which i want to run on several threads but each thread will return a different number of results. Is it possible to declare a private, thread specific, variable ie a list which i can then pass back to the Host and merge all the results?

Say i have an array as follows:

int[,] arr1 = new int[3,3] {{ 3, 4, 5 }, {4, 5, 6}, {1, 6, 4}};
int[] arr2 = new int[] { 3, 4, 1 };

Each thread will be give 3 values to analyze and records the difference between the value in arr2 and the values for a specific row in arr1.

[Cudafy]
public static void CountAbove(GThread thread, int[] a, int[,] b, list<int> c)
{
    int tid = thread.blockIdx.x;
    int threshold = a[tid];

    for(int i = 0; i < b.GetLength(0); i++)
    {
    if (threshold < b[tid,i]) c.add(b[tid,i] - threshold);
    }
}   
talonmies
  • 70,661
  • 34
  • 192
  • 269
Hans Rudel
  • 3,433
  • 5
  • 39
  • 62

1 Answers1

0

Yes it is possible. Declaring a local variable in a kernel is private to each thread that you launch. So simply declare a variable, use it, and when you want to store result in the host , copy it to global memory. You could give a location to global memory passing a pointer to it as argument to the kernel.

Example:

__global__ void kernel(float *var)
{
 float localVar;//local to each thread in execution
 ...
 //Computation which uses localVar
 ...
 *var = localVar;
}

After you use cudaMemcpy() to get it in the host. This example also is valid if you declare a local array. In that case you just have to copy an array instead of a single variable.

Edit#1:

Example of passing an array as argument to a kernel:

__global__ void kernel(float *arrayPtr, int length)
{
....
}

arrayPtr is a devicePtr which should be allocated before calling the kernel function. length is the size of the array.

KiaMorot
  • 1,668
  • 11
  • 22
  • How does this answer the question? – talonmies Jun 06 '13 at 11:39
  • There are still 3 things i don't get. 1) would the private variable be stored in Local Memory? I'm assuming i need to the make sure each thread doesnt exceed the max memory for the specific type of memory. 2) "a location to global memory passing a pointer to it as argument to the kernel" So i need to use something like List dev_c = _gpu.Allocate(N); and i would pass dev_c as a parameter to the method (Does Cudafy support List<> as i need to specify length with arrays). 3)If i declare an array in global memory, does CUDA handle multiple threads writing to it at once? – Hans Rudel Jun 06 '13 at 11:45
  • @talonmies ive been scanning the other Cuda/Cudafy questions on SO and i see ur very knowledgeable on this subject. Would you be willing/do you have the time to post an answer? If so i would really appreciate it as im still stumped. – Hans Rudel Jun 06 '13 at 11:48
  • @HansRudel: I don't know anything about Cudafy, sorry. – talonmies Jun 06 '13 at 11:48
  • @talonmies no worries.I'm assuming this is possible to do in CUDA C though? Thanks again for your help. – Hans Rudel Jun 06 '13 at 11:55
  • @HansRudel 1)A local variable goes to registers. If there's no space in registers then goes to Local Memory which is physically located in Global memory. Thus if you declare a local array it goes to Local memory if exceeds the capacity of the register block. 2) I do not know Cudafy either. But the general idea is that you pass a pointer to memory (which would be your array in this case) and another argument is the length of that array. – KiaMorot Jun 06 '13 at 12:04