27

What is the maximum number of blocks in a grid that can created per kernel launch? I am slightly confused here since

Now the compute capability table here says that there can be 65535 blocks per grid dimemsion in CUDA compute capability 2.0.

Does that mean the total number of blocks = 65535*65535?

Or does it mean that you can rearrange at most 65535 into a 1d grid of 65536 blocks or 2d grid of sqrt(65535) * sqrt(65535) ?

Thank you.

legends2k
  • 31,634
  • 25
  • 118
  • 222
smilingbuddha
  • 14,334
  • 33
  • 112
  • 189

2 Answers2

40

65535 per dimension of the grid. On compute 1.x cards, 1D and 2D grids are supported. On compute 2.x cards, 3D grids are also supported, so 65535, 65535 x 65535, and 65535 x 65535 x 65535 are the limits for Fermi (compute 2.x) cards.

EDIT: Since compute capability 3.x this limitation is only valid in y- and z-dimension. In the x-dimension the new limit is 2^31 - 1.

paleonix
  • 2,293
  • 1
  • 13
  • 29
talonmies
  • 70,661
  • 34
  • 192
  • 269
  • 5
    Copied the wrong values from the original question. Mea culpa. – talonmies May 19 '11 at 04:31
  • Which would be in contrast to the max number of threads per block, which is commonly 512, even though the max block size is 521 x 512 x 64. – Framester Jun 22 '11 at 11:04
  • 2
    Though it seems wrong, my build of the CUDA sample program, deviceQuery, indicates I can use 2147483647 for the first dimension: `Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)` – user2023370 Aug 25 '16 at 16:31
  • 5
    @user2023370 That is correct for devices with Compute Capability 3.0 and higher, see https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications. The limit **only** in x direction is now 2^31-1. – cfh May 16 '17 at 18:02
  • Worth mentioning would be memory usage for that big grid sizes to be even usable...still its good to know dimenzion you can do. – Jakub Jul 08 '22 at 19:47
  • 1
    @jakub: Grids don’t intrinsically use any GPU memory and the execution model is designed so that as long as the per block resource requirements can be satisfied, any grid size of any kernel which matches the compute capability specific limits can run on any GPU. Your statement isn’t correct. – talonmies Jul 09 '22 at 09:06
-5

i think it is 65535 per grid..

reva
  • 35
  • 3