Why the thread is the same with multiple threads in PyCUDA

Question

I have the following program

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

mod = SourceModule("""
    #include <stdio.h>

     __global__ void myfirst_kernel()
       {
        printf("I am in block no: %d thread no: %d \\n", blockIdx.x, threadIdx.x);
      }
""")
 
function = mod.get_function("myfirst_kernel")
function(grid=(10,2),block=(1,1,1))

As you can see I am running 10 blocks and 2 threads per block. However the output is

python thread_execution.py 
I am in block no: 1 thread no: 0 
I am in block no: 7 thread no: 0 
I am in block no: 1 thread no: 0 
I am in block no: 7 thread no: 0 
I am in block no: 3 thread no: 0 
I am in block no: 0 thread no: 0 
I am in block no: 3 thread no: 0 
I am in block no: 6 thread no: 0 
I am in block no: 9 thread no: 0 
I am in block no: 0 thread no: 0 
I am in block no: 9 thread no: 0 
I am in block no: 6 thread no: 0 
I am in block no: 5 thread no: 0 
I am in block no: 2 thread no: 0 
I am in block no: 5 thread no: 0 
I am in block no: 8 thread no: 0 
I am in block no: 4 thread no: 0 
I am in block no: 2 thread no: 0 
I am in block no: 8 thread no: 0 
I am in block no: 4 thread no: 0

I was expecting threadIdx.x would give me 1 too. Why is always 0?

You are running one thread per block. Numbered from zero…. — talonmies, Jul 19 '23 at 12:03

score 1 · Answer 1 · answered Jul 20 '23 at 03:46

You are not running multiple threads per block. This:

function(grid=(10,2),block=(1,1,1))

launches a grid of 10 x 2 blocks, each of one thread each. threadIdx.x will be zero in each case, with blockIdx.x varying between 0 and 9 (as shown in your output), and blockIdx.y varying between 0 and 1 (not shown in your output but the reason there are two outputs per value of blockIdx.x).

Why the thread is the same with multiple threads in PyCUDA

1 Answers1