I have coded a vector expression template library for the CPU using template meta programming. However, I have difficulty creating GPU kernel for a given expression. Please advise on how I can create a string of the expression (c = a + b) given the expression tree, and the list of parameters to pass as kernel arguments. I have read about the techniques in papers but have difficulty putting it into code. One problem is that I don't know how to store the names of the variables (a,b,c) to be used in the expression. I guess that just giving them them random unique names like x0,x1,x2 might work. A code snippet would be of great help. Thanks
Here are the templates for a kernel, and the actual kernel for c=a+b, taken from "CUDA expression templates" https://pdfs.semanticscholar.org/5d08/a871b72f12a7ee40aeb2a69bca27a23733db.pdf
extern "C" __global__ void kernel ( float∗ a ,
/∗ parameterlist ∗/ , unsigned int size ) {
idx = blockDim . x ∗ blockIdx . x + threadIdx . x ;
if ( idx < size ) {
a [ idx ] = / ∗ evaluation line ∗ / ;
}
}
Listing 4: The Kernel prototype.
extern "C" __global__ void kernel ( float∗ a ,
float* b, float* c , unsigned int size ) {
idx = blockDim . x ∗ blockIdx . x + threadIdx . x ;
if ( idx < size ) {
a [ idx ] = b [ idx ] + c [ idx ] ;
}
}
Listing 5: Kernel generated by compiling Listing 2.