1

I can't believe after all the research and reading I've done I am still not 100% clear on how to do this, so I must ask.. I am trying to get something like the following to run on a gpu card and I am using Cudafy.Net to generate the Cuda C equivalent. I want to get this to run as fast as possible.

If I have a function (simplified) such as:

Transform()
{
    for (lgDY = 0; lgDY < lgeHeight; lgDY++)
    {
        for (lgDX = 0; lgDX < lgeWidth; lgDX++)
        {
             // do a lot of stuff with lgDY and lgDX like stuff a matrix
        }
     }
}

I am invoking this with the Launch() function as follows:

gpu.Launch(blocksize, threadsize, "Transform", args...)

I am familiar with the GThread passed as first argument, and blocksize.x, blockdim.x and threadsize.x, and also the y and z for the block. I am having a hard time understanding if the for statements go away and I replace them with a test sort of like

if ( y < lgeHeight )
    if ( x < lgeWidth )
...

But then have no idea how to "tie each iteration to an incremented lgDY and lgDX.

I apologize if it's something blatantly obvious or if I haven't described what I am trying to do accurately. Just confused on how to make the nested loop correct. I appreciate any and all help to get me moving in the right direction.

1 Answers1

1

It depends on the size of lgeHeight and lgeWidth. If the product of them is less than the threads on the card, then when you launch the kernel you can assume that each thread will run on one pair of x and y.

lgDY = threadIdx.x
lgDX = blockIdx.x

Then you can compute them all at once. If you have more threads than the product, then you will need to divide the problem up into smaller pieces or run a small iteration for each matrix.

Milhous
  • 14,473
  • 16
  • 63
  • 82
  • Thank you for your reply Milhous. The card has max threads per block of 1024, max thread dimensions (1024, 1024, 1) and max grid dimensions of (2147483647, 65535, 1) and lgeHeight=2150 and lgeWidth=4300 which yields 9245000. If grid dimensions is the parameter I compare to then it looks like I'd have enough threads? – user3143237 Jun 26 '17 at 23:23
  • I think it depends on your card. What card are you using? I don't know a card with that many threads. – Milhous Jun 27 '17 at 02:24
  • I'm using a GeForce GTX 1050 Ti – user3143237 Jun 27 '17 at 02:31