Threads on the GPU

Question

I'm using DirectCompute to do general computing on the GPU. Currently, I'm trying to operate on a texture with resolution 1920x1080. I have a Dispatch(2, 1080, 1) and numthreads(960, 1, 1) which according to my calculations exactly covers my image with one thread per pixel.

Now, as I understand it, all threads should run at the same time, right? However, in my code, I do not do any computation if the pixel is black. So I've noticed a definite increase in performance when most of my image is black. However, if one object blocks up the screen, the performance drops drastically.

My question is: if all the threads are running in parallel, the speed of processing a frame would be determined by the worst performing thread, essentially the threads running on the black pixels will be idling, right? So why then am I seeing a slow-down when more pixels are processed? They should be doing so at the same time. Or have I got this all wrong?

Any help would be appreciated.

score 2 · Accepted Answer · answered Oct 25 '12 at 09:28

Not all threads execute concurrently. The exact numbers have probably changed a bit, but a few years ago, a high-end CPU was able to keep 16k threads in flight at a time, but "only" a few hundred of them actually executed concurrently. (This is then further subdivided into smaller subgroups, and every thread in such a subgroup runs in exact lockstep, instruction by instruction, branch by branch) The rest were suspended, waiting for I/O or otherwise blocked.

So if you have an algorithm requiring two million executions, then only a fraction of them are going to even exist as threads at any time, and of those, only a fraction are actually executing in a single batch. And among the threads that are currently executing, some are forced to run in exact lockstep (so there's no such thing as one of the threads exiting early, the entire group has to follow the same path), but different groups can terminate at different times.

Yes, threading on the GPU is complicated.

I think this explains a lot I was wondering about. Thanks for your answer. I'm going to have to do a little more research about GPU threading on the internet. — l3utterfly, Oct 26 '12 at 05:12

score 0 · Answer 2 · answered Oct 25 '12 at 09:20

0

If you have a very heavy algorithm, and are using your image for backbuffer rendering it could create a stall. Forcing the backbuffer to wait for the image. try render it the next frame. so you are "frame-behind".

And how dose your algorithm look like?

answered Oct 25 '12 at 09:20

Tordin

340
1
10

Well, I'm doing double buffering so I'm always displaying one image and doing the computing for the next at the same time. However, I think the problem lies with my algorithm being too slow. But what jalf said is immensely enlightening. My algorithm? I'm exploring the area of real-time ray-tracing and have just written a brute force algorithm to test out the capabilities of my GPU. Obviously there will be a tonne of optmisations needed. – l3utterfly Oct 26 '12 at 05:15

Threads on the GPU

2 Answers2