Stencil buffer and deferred rendering using OpenGL and GLSL

Question

I'm wondering one thing concerning the usage of stencil buffer in a deferred rendering context: do all the fragment shaders on screen space is used within the 'occluded' area ?

Here's an example for the website http://www.learnopengl.com/#!Advanced-OpenGL/Stencil-testing (just stencil buffer topic, no relation with deferred rendering):

enter image description here

question: what's the expression "the others are discarded" means here ? It means that none pixel shader will be invoked where the value is 0 in the mask sampler OR for each fragment on the screen a pixel shader will be invoked BUT a condition will be applied discarding the filling of the pixel if the value is 0 ?

Let's suppose that the first picture on the left is the result without stencil buffering. The support of the information is a quad composed by 2 triangles (we apply deferred rendering technique and so we're working in screen space - the dimention of the screen is 500x500). So after rasterization, it will be invoked 500 * 500 fragment shaders to fill the frame buffer and all of them will be used even on the dark area where is no light. It means that if we apply a blinn-phong shading model, this last will be applied everywhere on the screen EVEN on the dark area and I think it's a waste for performance.

So, the logical thing to do in this case should be to create a mask (using the stencil buffer or using an external custom mask render pass using an other frame buffer to fill it) and finally use the blinn-phong shading model only for example where the value of the pixel in the mask sampler in screen space is 1. This way the phong shading model will be applied ONLY in our example onto the 2 boxes and the plane!

The tricks here to do the job correct in a first approach should be to add a condition in the fragment shader to tell if we need to compute blinn-phong shading for the current fragment or not according to the value of the sampled mask texture.

void main(void)
{
    if (texture(MaskSampler, TexCoord.xy).r == 1.0f)
    {
         //Execute here Blinn-Phing shading model...
    }
    //Else nothing
}

But I'm wondering (if we look at the third picture above) if it's possible to tell to OpenGL API to invoke fragment shaders only concerned by the colored area! (it means we do not enter in the main of the fragment shader). In this case the number of fragment shader used will be drasticly reduced and for the performance it will be better! Or the only solution is to put a condition in the fragment shader like I mentionned above?

I would think checking the state of your pixel adds overhead rather than simply ORing the bit. Your GPU is processing your entire area parallel, so you are already processing that pixel. Adding an if condition will add more cycles than just doing bitwise OR. — Dr.Knowitall, Jun 05 '15 at 22:42
How can I ORing the bit ? So you maintain in all cases for each frame in my example 500 * 500 pixel shader will be invoked ? Even the ones in the dark area if I use stencil buffer ? — user1364743, Jun 05 '15 at 22:54
Assuming that scene in the images you provided consists only of a plane and two boxes, [almost] no resources are wasted for drawing dark area because nothing is drawn there. Also, in most cases, figuring out where not to apply lighting calculation and selectively applying it would be bigger performance hit, than just applying it everywhere. — n0rd, Jun 05 '15 at 22:55
There are actually two ways of doing this, by the way. The stencil test may be implemented after the fragment shader runs (traditional), or on some hardware (pretty much any modern hardware), the stencil test may occur before the fragment shader runs and then skip it altogether if the test fails. The first way described won't improve performance any, neither will adding a condition to your fragment shader to discard. — Andon M. Coleman, Jun 05 '15 at 22:57
Ok n0rd so you don't advise me to create a mask texture to select the area where applying a blinn-phong shading model for example. Belongs to you it would be faster without mask even if I apply the model even onto the dark pixels ? — user1364743, Jun 05 '15 at 22:59
Hello Andon, you said "The stencil test may be implemented after the fragment shader runs (traditional)". It means (to simplify) OpenGL render first the whole scene once and after that apply the stencil test and replaces all the colored pixel that has failed the test (for example) by a back pixel ? Is that right? — user1364743, Jun 05 '15 at 23:15
I say it is highly unlikely you gain anything, especially with naive implementation. Implement, profile, make decision for your specific case. Please notice, that branching instructions inside shaders usually introduce noticeable performance impact (i.e. it will work as slow as if both branches had executed). — n0rd, Jun 05 '15 at 23:18

score 1 · Accepted Answer · edited Oct 22 '19 at 06:37

In most cases, the stencil test will run before the fragment shader, and skip its execution. It is described in detail here, along with the conditions on why it might not execute before a fragment shader, although these are unlikely in your described setup. This would be much preferred over additional branches/samples in your fragment shader. It's also much easier to implement and maintain.

However, in the case of branching, your proposed method of running a much less complex shader may also provide a speed up (over say running the full shader for each pixel). This is because, most modern drivers optimize branch prediction on spatial coherence. Meaning, if all pixels in a local area always take the same branch, this can be optimized. A chapter in GPU Gems describes this process. Of course, this is highly dependent on the shader complexity, area, and driver implementation. The stencil method approach is much less ambiguous.

Stencil buffer and deferred rendering using OpenGL and GLSL

1 Answers1