CUDA : Global unique thread index in a 3D Grid

Question

As the question states, If I have a 3D Grid of Blocks what is the formula to get a global unique index for one thread ?

Lets keep the block itself as one dimensional.

possible duplicate of [Cuda, executional thread order in a 3d-block](http://stackoverflow.com/questions/11505117/cuda-executional-thread-order-in-a-3d-block) — talonmies, Jul 19 '12 at 05:28
Please search before asking new questions. There was an identical question asked and answered only 2 days ago. — talonmies, Jul 19 '12 at 05:29
Hey ! I went through that thread, it says out there that that threadId is the id of the thread within one particular block. That is not what the question is asking in post. Here I want to know the UNIQUE thread Id globally if I launch a 3D Grid with 1D Block. — dparkar, Jul 19 '12 at 05:47
The possible duplicate answer states Here threadID is the thread number within the block The code in the answer does not even uses a single `blockIdx` statement so it is within the block of the thread and not global. — djmj, Jul 19 '12 at 22:48

score 8 · Accepted Answer · answered Jul 19 '12 at 22:37

8

// unique block index inside a 3D block grid
const unsigned long long int blockId = blockIdx.x //1D
        + blockIdx.y * gridDim.x //2D
        + gridDim.x * gridDim.y * blockIdx.z; //3D

// global unique thread index, block dimension uses only x-coordinate
const unsigned long long int threadId = blockId * blockDim.x + threadIdx.x;

answered Jul 19 '12 at 22:37

djmj

5,579
5
54
92

1

Why do you need to make it `unsigned long long int`? There only ~256,000 threads, so `int` or `unsigned int` should be fine, right? – cmo May 31 '13 at 15:43
(`Maximum x-, y-, or z-dimension of a grid of thread blocks` power `Maximum dimensionality of grid of thread blocks`) * `Maximum number of threads per block` gives you the maximum number of total thread's. For Cuda 2.x this gives 65535³ * 1024 – djmj May 31 '13 at 16:22
`const std::size_t` is sufficient; try `static_assert(sizeof(unsigned long long int)==sizeof(std::size_t),"");` in your kernel. – user2023370 Aug 26 '16 at 12:00

score 1 · Answer 2 · answered Sep 12 '14 at 08:25

A bit late to the party but here is how I usually approach this in a pretty generic way in that it supports any number and size of block (even 2D):

// Compute the offset in each dimension
const size_t offsetX = blockDim.x * blockIdx.x + threadIdx.x;
const size_t offsetY = blockDim.y * blockIdx.y + threadIdx.y;
const size_t offsetZ = blockDim.z * blockIdx.z + threadIdx.z;

// Make sure that you are not actually outs
if (offsetX >= sizeX || offsetY >= sizeY || offsetZ >= sizeZ)
  return;

// Compute the linear index assuming that X,Y then Z memory ordering
const size_t idx = offsetZ * sizeX * sizeY + offsetY * sizeX + offsetX;

Mind you that I'm not a CUDA ninja.

You haven't defined `sizeX`, `sizeY` or `sizeZ`. – user2023370 Aug 26 '16 at 12:06 — user2023370, Aug 26 '16 at 12:06

joe · Answer 3 · 2019-10-05T17:06:07.807

The existing answer by @djmj is good, but a bit of re-formatting makes it a little more clear what's going on (at least to my brain - which is new to CUDA):

long blockId = blockIdx.z  *  gridDim.x*gridDim.y
             + blockIdx.y  *  gridDim.x
             + blockIdx.x;
long threadsPerBlock = blockDim.x;
long i = blockId * threadsPerBlock + threadIdx.x;

blockId is the sum of the blocks in the complete z-dimension "slices" (2D grids), plus the blocks in the complete rows of the final (incomplete) slice, plus the blocks in the final (incomplete) row of that (incomplete) slice.

By "complete", I mean the blocks that "precede" the current (x, y, z) block (with respect to the way we're summing them to determine the overall block id).

CUDA : Global unique thread index in a 3D Grid

3 Answers3

Linked