3

I have some heavy duty calculations on the fragment shader, and this in multiple passes. But on some fragments, the result is ready after one pass only, and for other fragments, it takes more passes.

Because of this, lots of fragments do discards after a while, but as we know, these are threads probably stalling until all the other fragments have completed.

I was wondering. Since some opengl version, it is possible to write to the stencil buffer from the fragment shader. When I know I don't need another pass on a specific fragment, I could possible write to the stencil buffer. In the next rendering pass, I could turn on stencil testing to prevent those fragment from calculating again.

My question: will this prevent the stalling-problem? Will these threads become available to perform more of the still-to-be-processed fragments so that each pass will be faster than the previous when each pass eliminates fragments for the next pass?

In other words, say that I have a 16x16 texture I need to calculate. In the first pass, I will have to calculate 256 fragments (say that I have 16 cores, this would mean 16 cycles). If for example after this first pass, I know that I only need to calculate further on 128 fragments, and I have marked the fragments that are done in the stencil buffer, will this second pass perform twice as fast (so: 128 fragments on 16 cores = 8 cycles instead of 16)?

genpfault
  • 51,148
  • 11
  • 85
  • 139
scippie
  • 2,011
  • 1
  • 26
  • 42
  • it may do so it may not. it depends on how the data is distributed across the cores – ratchet freak Dec 20 '13 at 15:34
  • Any way to know? Any way to force it? Or is it hardware dependant? Is this known for nvidia or ati? – scippie Dec 20 '13 at 15:35
  • it could also be that the discards will get an early return on a batch, best way to know is profiling on your system – ratchet freak Dec 20 '13 at 15:44
  • Probably not, you mentioned very early in your question that the fragment shader `discards`, this will prevent early depth/stencil testing on a lot of hardware. – Andon M. Coleman Dec 20 '13 at 18:27
  • Well @AndonM.Coleman, I would no longer discard then, I would set the stencil value to "do not render next time". I would no longer need discard as the stencil buffer would do that for me without me having to test again if the fragment should be discarded. I'm just wondering if the fragment shader will still use up a thread when the stencil buffer is marked to not render the fragment. – scippie Dec 20 '13 at 21:41
  • @ratchetfreak: yes, I knew that answer would come sooner or later, but how many games work fast on the developer's computer and are terribly slow on the gamer's computer because of "I tested it, it worked good on my system"... – scippie Dec 20 '13 at 21:42
  • @scippie: Ah, that is a little bit trickier. Generally modern hardware draws at least 2x2 pixels at a time (the partial derivative instruction in Shader Model 3.0 basically requires this behavior), but on top of that the 2x2+ blocks of pixels are crammed into even larger basic thread scheduling units known as warps (NV) or wavefronts (AMD). If only 1/N of the 2x2 blocks of pixels within a warp/wavefront fails a stencil test then there will be wasted compute power, but if an entire region (early Z is usually tiled, and I imagine early stencil is too) fails a smart GPU could skip a lot of work. – Andon M. Coleman Dec 20 '13 at 22:55

0 Answers0