I have the following program
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
mod = SourceModule("""
#include <stdio.h>
__global__ void myfirst_kernel()
{
printf("I am in block no: %d thread no: %d \\n", blockIdx.x, threadIdx.x);
}
""")
function = mod.get_function("myfirst_kernel")
function(grid=(10,2),block=(1,1,1))
As you can see I am running 10 blocks and 2 threads per block. However the output is
python thread_execution.py
I am in block no: 1 thread no: 0
I am in block no: 7 thread no: 0
I am in block no: 1 thread no: 0
I am in block no: 7 thread no: 0
I am in block no: 3 thread no: 0
I am in block no: 0 thread no: 0
I am in block no: 3 thread no: 0
I am in block no: 6 thread no: 0
I am in block no: 9 thread no: 0
I am in block no: 0 thread no: 0
I am in block no: 9 thread no: 0
I am in block no: 6 thread no: 0
I am in block no: 5 thread no: 0
I am in block no: 2 thread no: 0
I am in block no: 5 thread no: 0
I am in block no: 8 thread no: 0
I am in block no: 4 thread no: 0
I am in block no: 2 thread no: 0
I am in block no: 8 thread no: 0
I am in block no: 4 thread no: 0
I was expecting threadIdx.x would give me 1 too. Why is always 0?