I have recently implemented (Tested) OpenCL using a Struct to carry and update a C++ class object using a simple function written to the kernel and found to my dismay that the same function when processed without the kernel using a simple for loop was in fact faster.
Here is the kernel function :
__kernel void function_x_y_(__global myclass_* input,long n)
{
int gid = get_global_id(0);
if(gid<n)
input[gid].valuez = input[gid].valuey * input[gid].valuex * 8736;
}
Here is the for loop :
for(int i=0;i<100;i++){
thisclass[i].function_x_y();
}
and the class function :
void function_x_y(){
valuez = valuex * valuey;
}
I ran a clock on both process :
cout<<"Run function in serial\n";
startTime = clock();
for(int i=0;i<100;i++){
thisclass[i].function_x_y();
}
endTime = clock();
cout << "It took (serial) " << (endTime -startTime) / (CLOCKS_PER_SEC / 1000000) << " ms. " << endl;
cout<<"Run function in parallel using struct to write to object\n";
init_ocl();
startTime = clock();
load_kernel_from_struct("function_x_y_",p_struct,100); //Loads function and variables into opencl
endTime = clock();
cout << "It took (parallel) " << (endTime -startTime) / (CLOCKS_PER_SEC / 1000000 ) << " ms. " << endl;
With the output:
Run function in serial
It took (serial) 5 ms.
Run function in parallel using struct to write to object
It took (parallel) 159010 ms.
I am using the cl-helper.c by Andreas Kloecker
I dont understand this it should be faster. Any help or advice is welcome.
Is there a more accurate speed test? Could this be due to the fact that it takes time to initialise assign memory and transfer the data to the kernel?
There must be a way to ensure that this works faster could it be that I must transfer and initialise everything before running the function?
Thanks, Hbyte.