-1

I have the following program

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

mod = SourceModule("""
    #include <stdio.h>

     __global__ void myfirst_kernel()
       {
        printf("I am in block no: %d thread no: %d \\n", blockIdx.x, threadIdx.x);
      }
""")
 
function = mod.get_function("myfirst_kernel")
function(grid=(10,2),block=(1,1,1))

As you can see I am running 10 blocks and 2 threads per block. However the output is

python thread_execution.py 
I am in block no: 1 thread no: 0 
I am in block no: 7 thread no: 0 
I am in block no: 1 thread no: 0 
I am in block no: 7 thread no: 0 
I am in block no: 3 thread no: 0 
I am in block no: 0 thread no: 0 
I am in block no: 3 thread no: 0 
I am in block no: 6 thread no: 0 
I am in block no: 9 thread no: 0 
I am in block no: 0 thread no: 0 
I am in block no: 9 thread no: 0 
I am in block no: 6 thread no: 0 
I am in block no: 5 thread no: 0 
I am in block no: 2 thread no: 0 
I am in block no: 5 thread no: 0 
I am in block no: 8 thread no: 0 
I am in block no: 4 thread no: 0 
I am in block no: 2 thread no: 0 
I am in block no: 8 thread no: 0 
I am in block no: 4 thread no: 0 

I was expecting threadIdx.x would give me 1 too. Why is always 0?

talonmies
  • 70,661
  • 34
  • 192
  • 269
KansaiRobot
  • 7,564
  • 11
  • 71
  • 150

1 Answers1

1

You are not running multiple threads per block. This:

function(grid=(10,2),block=(1,1,1))

launches a grid of 10 x 2 blocks, each of one thread each. threadIdx.x will be zero in each case, with blockIdx.x varying between 0 and 9 (as shown in your output), and blockIdx.y varying between 0 and 1 (not shown in your output but the reason there are two outputs per value of blockIdx.x).

talonmies
  • 70,661
  • 34
  • 192
  • 269