Declaring a private, thread specific, variable in a kernel and then returning that variable to the host

Question

I have a method which i want to run on several threads but each thread will return a different number of results. Is it possible to declare a private, thread specific, variable ie a list which i can then pass back to the Host and merge all the results?

Say i have an array as follows:

int[,] arr1 = new int[3,3] {{ 3, 4, 5 }, {4, 5, 6}, {1, 6, 4}};
int[] arr2 = new int[] { 3, 4, 1 };

Each thread will be give 3 values to analyze and records the difference between the value in arr2 and the values for a specific row in arr1.

[Cudafy]
public static void CountAbove(GThread thread, int[] a, int[,] b, list<int> c)
{
    int tid = thread.blockIdx.x;
    int threshold = a[tid];

    for(int i = 0; i < b.GetLength(0); i++)
    {
    if (threshold < b[tid,i]) c.add(b[tid,i] - threshold);
    }
}

KiaMorot · Answer 1 · 2013-06-06T12:05:44.257

0

Yes it is possible. Declaring a local variable in a kernel is private to each thread that you launch. So simply declare a variable, use it, and when you want to store result in the host , copy it to global memory. You could give a location to global memory passing a pointer to it as argument to the kernel.

Example:

__global__ void kernel(float *var)
{
 float localVar;//local to each thread in execution
 ...
 //Computation which uses localVar
 ...
 *var = localVar;
}

After you use cudaMemcpy() to get it in the host. This example also is valid if you declare a local array. In that case you just have to copy an array instead of a single variable.

Edit#1:

Example of passing an array as argument to a kernel:

__global__ void kernel(float *arrayPtr, int length)
{
....
}

arrayPtr is a devicePtr which should be allocated before calling the kernel function. length is the size of the array.

edited Jun 06 '13 at 12:05

answered Jun 06 '13 at 11:19

KiaMorot

1,668
11
22

How does this answer the question? – talonmies Jun 06 '13 at 11:39
There are still 3 things i don't get. 1) would the private variable be stored in Local Memory? I'm assuming i need to the make sure each thread doesnt exceed the max memory for the specific type of memory. 2) "a location to global memory passing a pointer to it as argument to the kernel" So i need to use something like List dev_c = _gpu.Allocate(N); and i would pass dev_c as a parameter to the method (Does Cudafy support List<> as i need to specify length with arrays). 3)If i declare an array in global memory, does CUDA handle multiple threads writing to it at once? – Hans Rudel Jun 06 '13 at 11:45
@talonmies ive been scanning the other Cuda/Cudafy questions on SO and i see ur very knowledgeable on this subject. Would you be willing/do you have the time to post an answer? If so i would really appreciate it as im still stumped. – Hans Rudel Jun 06 '13 at 11:48
@HansRudel: I don't know anything about Cudafy, sorry. – talonmies Jun 06 '13 at 11:48
@talonmies no worries.I'm assuming this is possible to do in CUDA C though? Thanks again for your help. – Hans Rudel Jun 06 '13 at 11:55
@HansRudel 1)A local variable goes to registers. If there's no space in registers then goes to Local Memory which is physically located in Global memory. Thus if you declare a local array it goes to Local memory if exceeds the capacity of the register block. 2) I do not know Cudafy either. But the general idea is that you pass a pointer to memory (which would be your array in this case) and another argument is the length of that array. – KiaMorot Jun 06 '13 at 12:04

Declaring a private, thread specific, variable in a kernel and then returning that variable to the host

1 Answers1