0

This is a conceptual question. In CUDA, gridDim, blockDim and threadIdx can be 1D, 2D or 3D. I wonder, how are their 2D and 3D versions interpreted?

In more details, does CUDA think of multi-dimensional gridDim, blockDim and threadIdx just as a linear sequence, in the same way that C stores multi-dimensional array? If not, how should we interpret multi-dimensional gridDim, blockDim and threadIdx?

Thanks.

Edit 1. This question is not a duplicated one. I actually have come across the referred question. It asks about the order of execution of the GPU threads, not their layouts, as this one does.

Edit 2. Also, the answer to this question can be found at http://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-hierarchy. Thank you @talonmies, for the reference. To sum it up, multi-dimensional gridDim, blockDim and threadIdx is for convenience purposes. They can be interpreted just like a column major ordered multi-dimensional array.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Hieu Pham
  • 97
  • 1
  • 8
  • 2
    What do you mean by *how are their 2D and 3D versions interpreted?* ? Could you expand your question a bit more? – haccks Jun 25 '15 at 18:31

2 Answers2

1

Quoting directly from the CUDA programming guide

The index of a thread and its thread ID relate to each other in a straightforward way: For a one-dimensional block, they are the same; for a two-dimensional block of size (Dx, Dy),the thread ID of a thread of index (x, y) is (x + y Dx); for a three-dimensional block of size (Dx, Dy, Dz), the thread ID of a thread of index (x, y, z) is (x + y Dx + z Dx Dy).

So, yes the logical thread numbering in the programming model is sequential, with then the x dimension varying fastest, then the y dimension, then the z dimension. This applies both to thread numbering within blocks and block numbering within a grid. The numbering is analogous to column major ordered multi-dimensional arrays, although the actual threadIdx and blockIdx variables themselves are just structures reflecting internal thread and block identification words assign by the scheduler to each thread or block.

You should note that numbering implied by threadIdx and blockIdx are just for programmer convenience and don't imply anything about execution order of threads on the GPU.

talonmies
  • 70,661
  • 34
  • 192
  • 269
0

In more details, does CUDA think of multi-dimensional gridDim, blockDim and threadIdx just as a linear sequence, in the same way that C stores multi-dimensional array?

Yes.
All multidimensional arrays are linearized in C. They are linearized in row-major order--place all elements of the same row into consecutive locations, the rows are then placed one after another into the memory space.
CUDA C also uses row-major layout. An example of 2D array layout:

enter image description here

haccks
  • 104,019
  • 25
  • 176
  • 264
  • 2
    This doesn't answer the question – talonmies Jun 25 '15 at 18:53
  • @talonmies; Actually I was waiting for the CUDA champ! :) Please elaborate? – haccks Jun 25 '15 at 18:55
  • 2
    The question isn't about array layouts. It is about thread and block numbering ordering within GPU kernel grids – talonmies Jun 25 '15 at 18:58
  • @talonmies; AFAIK, Grids are an array of thread blocks and blocks are array of threads. – haccks Jun 25 '15 at 18:59
  • @talonmies; A reference would be helpful. – haccks Jun 25 '15 at 19:02
  • @talonmies; I just read that section again and found nothing that says they do not use row-major order. – haccks Jun 25 '15 at 19:12
  • @haccks: What they use is more like Fortran memory model. – Grzegorz Szpetkowski Jun 25 '15 at 19:14
  • @GrzegorzSzpetkowski; It was past. Now read section "4.2 MAPPING THREADS TO MULTIDIMENSIONAL DATA" from the book [Programming Massively Parallel Processors-2nd ed](http://www.amazon.com/gp/product/0124159923/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687682&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0123814723&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=1C9TPM14AP757F3VVGAX) (Page number 68). – haccks Jun 25 '15 at 19:17
  • @haccks: I meant thread/block indexing, not memory model as it's totally different thing. Basically, for a thread block defined as `dim3(3, 4)` the first indice is growing faster than second. – Grzegorz Szpetkowski Jun 25 '15 at 19:37
  • @GrzegorzSzpetkowski; I am not surprised what you are talking because I believed same till reading the above book which says: *"For example, `block(1,0)` has `blockIdx.y = 1` and `blockIdx.x = 0`. Note that the ordering of the labels is such that the highest dimension comes first.This is reverse of the ordering used in the configuration parameters where the lowest dimension comes first."* See page no. 67. – haccks Jun 25 '15 at 22:36
  • @talonmies; Above comment is for you too. – haccks Jun 25 '15 at 22:44
  • @haccks: The book is just wrong (in this context). First, there is no such thing as `block(1, 0)`, since it doesn't make sense. The lowest dimensions of a block are `dim(1, 1, 1)` or you get `invalid configuration argument` error by calling kernel. For a block defined as `dim3(2, 1, 1)` only `blockIdx.x` varies from `[0, 1]`, others are just zero as `gridDim.x = 2` and other dimension are one. This is how it was since the beginning. – Grzegorz Szpetkowski Jun 26 '15 at 09:06
  • @GrzegorzSzpetkowski; I think you misunderstood my point. I would like to say [download](http://prefiles.com/uekj86bjnvxo/MK.Programming.Massively.Parallel.Processors.2nd.Edition.Dec.2012.rar) this book or drop your email id here and I will send you and read the section 4.1 and 4.2. – haccks Jun 26 '15 at 09:57
  • @haccks: Thanks, I'll read it, however note that definite reference of CUDA is CUDA C Programming Guide, that comes with documentation package and actual compiler's behavior. – Grzegorz Szpetkowski Jun 26 '15 at 10:07