2

I have query about following question:

Suppose, we have a 9*7 picture (7 pixels in the x direction and 9 pixels in the y direction), how many warps will have control divergence assuming block of 4*4 threads and 8 threads per warp?

How will the blocks and warps be organized here? for x or horizontal direction, i can assume 2 blocks per row.Similarly, for vertical direction, 3 blocks per column. But, How will the warps are organized? Can someone point out the thread ids of the warps , and the cases where control divergence happens(Thread ids etc for those).

thanks

einpoklum
  • 118,144
  • 57
  • 340
  • 684
user915783
  • 689
  • 1
  • 9
  • 27

1 Answers1

7

Suppose, we have a 9*7 picture (7 pixels in the x direction and 9 pixels in the y direction), how many warps will have control divergence assuming block of 4*4 threads and 8 threads per warp?

  1. Divergence is a property of the program (the code), not of the block/warp layout itself. If your algorithm operates identically across all pixels in the image then there will be no divergence whatsoever, irrespective of the number of threads and their organization. If your algorithm branches on warp boundaries, there will be no divergence either. Therefore, without seeing your code, your question is technically unanswerable.
  2. If you're running with a block of 16 threads and 8 threads per warp (which is not physically possible on CUDA hardware: warps are made of 32 threads and their size is not configurable) then you might as well run without a GPU at all. These numbers are way too small to benefit from any hardware acceleration.

How will the blocks and warps be organized here? for x or horizontal direction, i can assume 2 blocks per row.Similarly, for vertical direction, 3 blocks per column. But, How will the warps are organized?

I'll stick to your example and try to provide a schema of the thread IDs, block IDs, warp IDs. Keep in mind that this layout is, in practice, impossible on CUDA hardware.

Image     Global Thread IDs      Block IDs              Local Thread IDs
□□□□□□□ | 00 01 02 03 04 05 06 | 00 00 00 00 00 00 00 | 00 01 02 03 04 05 06
□□□□□□□ | 07 08 09 10 11 12 13 | 00 00 00 00 00 00 00 | 07 08 09 10 11 12 13
□□□□□□□ | 14 15 16 17 18 19 20 | 00 00 01 01 01 01 01 | 14 15 00 01 02 03 04
□□□□□□□ | 21 22 23 24 25 26 27 | 01 01 01 01 01 01 01 | 05 06 07 08 09 10 11
□□□□□□□ | 28 29 30 31 32 33 34 | 01 01 01 01 02 02 02 | 12 13 14 15 00 01 02
□□□□□□□ | 35 36 37 38 39 40 41 | 02 02 02 02 02 02 02 | 03 04 05 06 07 08 09
□□□□□□□ | 42 43 44 45 46 47 48 | 02 02 02 02 02 02 03 | 10 11 12 13 14 15 00
□□□□□□□ | 49 50 51 52 53 54 55 | 03 03 03 03 03 03 03 | 01 02 03 04 05 06 07
□□□□□□□ | 56 57 58 59 60 61 62 | 03 03 03 03 03 03 03 | 08 09 10 11 12 13 14
----------------------------------------------------------------------------
Image     Global Warp IDs        Block IDs              Local Warp IDs
□□□□□□□ | 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00
□□□□□□□ | 00 01 01 01 01 01 01 | 00 00 00 00 00 00 00 | 00 01 01 01 01 01 01
□□□□□□□ | 01 01 02 02 02 02 02 | 00 00 01 01 01 01 01 | 01 01 00 00 00 00 00
□□□□□□□ | 02 02 02 03 03 03 03 | 01 01 01 01 01 01 01 | 00 00 00 01 01 01 01
□□□□□□□ | 03 03 03 03 04 04 04 | 01 01 01 01 02 02 02 | 01 01 01 01 00 00 00
□□□□□□□ | 04 04 04 04 04 05 05 | 02 02 02 02 02 02 02 | 00 00 00 00 00 01 01
□□□□□□□ | 05 05 05 05 05 05 06 | 02 02 02 02 02 02 03 | 01 01 01 01 01 01 00
□□□□□□□ | 06 06 06 06 06 06 06 | 03 03 03 03 03 03 03 | 00 00 00 00 00 00 00
□□□□□□□ | 07 07 07 07 07 07 07 | 03 03 03 03 03 03 03 | 01 01 01 01 01 01 01
----------------------------------------------------------------------------

and the cases where control divergence happens(Thread ids etc for those)

As mentioned above, divergence being a property of the code and not the thread layout, this question cannot be answered without code.

user703016
  • 37,307
  • 8
  • 87
  • 112
  • First of All. Thank You so much for the diagram. (I could not vote up due to <15 reputation). Now, I chose this small size (which I know is not practical) so that it would be easy to illustrate , if someone would draw a diagram, as you did. Also, for row 0( for example) and Thread ID 4-6, won't the blockID be (1,0), assuming BlockIdx.x,BlockIdx.y notation and horizontal x and vertical y? – user915783 Jan 27 '15 at 06:45
  • I used linear notation because it's much easier to represent than (x, y) and it's easier to see to which warp they are assigned. But that's a bijection: any linear index `i` can be represented as `(x, y)` and vice-versa, using `i = x + y * xDim` and `(x = i % xDim, y = i / xDim)`. So for thread 4-6, their coordinates are (1, 0) and (1, 2), within block 0 (which doesn't have coordinates because you didn't specify a grid dim). – user703016 Jan 27 '15 at 07:24