The following source code is from the book Programming in Parallel with CUDA, Page: 25-26:
15.1 __global__ void gpu_sin(float *sums, int steps, int terms, float step_size)
15.2 {
15.3 int step = blockIdx.x*blockDim.x+threadIdx.x;
15.4 if(step<steps){
15.5 float x = step_size*step;
15.6 sums[step] = sinsum(x,terms); // store values
15.7 }
15.8 }
The book gives an explanation:
Line 15.3 declares a variable step equivalent to the for loop index variable of the same name in line 24 of Example 1.1. It is set to a value defined by the built-in variables
blockDim.x
,blockIdx.x
andthreadIdx.x
. The values of these variables depend on the launch parameters used in the host call to the kernel as follows:○
blockDim.x
will be set to threads, i.e., the thread block size used by the kernel.
○blockIdx.x
will be set to the rank of the thread block to which the current thread belongs and will be in the range[0,blocks-1]
.
○threadIdx.x
will be set to the rank of the current thread within its thread block and will be in the range[0,threads-1]
.
○step = blockDim.x * blockIdx.x + threadIdx.x
is in range[0, threads × blocks - 1]
.
I haven't understood the last point.
How is this being calculated: step = blockDim.x * blockIdx.x + threadIdx.x
?
What is the rationale?