I am using this GPU for my thesis in the university
I am running a lot of different kernels on this thing and the execution time is stuck at 12666.6689 ms, even though I had a loop with 88 instructions * 100m iterations.
__kernel void scalar_mult_add(__global int * list)
{
unsigned int x=38;
unsigned int y=38;
for(int i=0; i<1000000 ; i++){
y=x*y;
x=x+y;
}
}
The only thing that can make the execution time get increase is adding x!=0
inside the for loop statements
__kernel void scalar_mult_add(__global int * list)
{
unsigned int x=38;
unsigned int y=38;
for(int i=0; i<1000000 && x!=0 ; i++){
y=x*y;
x=x+y;
}
}
Why does this thing happens all the time? I can't understand: e.g. 88 million instructions have the same execution time as 1 million instructions, even though I don't have that much units to execute such big kernel at the same time like 1 million instructions.
Why does adding a single x!=0
statement in the loop makes the execution-time increased that much and couple of additions inside the for loop do not?