I'm using DirectCompute to do general computing on the GPU. Currently, I'm trying to operate on a texture with resolution 1920x1080. I have a Dispatch(2, 1080, 1) and numthreads(960, 1, 1) which according to my calculations exactly covers my image with one thread per pixel.
Now, as I understand it, all threads should run at the same time, right? However, in my code, I do not do any computation if the pixel is black. So I've noticed a definite increase in performance when most of my image is black. However, if one object blocks up the screen, the performance drops drastically.
My question is: if all the threads are running in parallel, the speed of processing a frame would be determined by the worst performing thread, essentially the threads running on the black pixels will be idling, right? So why then am I seeing a slow-down when more pixels are processed? They should be doing so at the same time. Or have I got this all wrong?
Any help would be appreciated.