4

I'm trying to get early fragment culling to work, based on the stencil test. My scenario is the following: I have a fragment shader that does a lot of work, but needs to be run only on very few fragments when I render my scene. These fragments can be located pretty much anywhere on the screen (I can't use a scissor to quickly filter out these fragments).

In rendering pass 1, I generate a stencil buffer with two possible values. Values will have the following meaning for pass 2:

  • 0: do not do anything
  • 1: ok to proceed, (eg. enter the fragment shader, and render)

Pass 2 renders the scene properly speaking. The stencil buffer is configured this way:

glStencilMask(1);
glStencilFunc(GL_EQUAL, 1, 1); // if the value is NOT 1, please early cull!
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP); // never write to stencil buffer

Now I run my app. The color of selected pixels is altered based on the stencil value, which means the stencil test works fine.

However, I should see a huge, spectacular performance boost with early stencil culling... but nothing happens. My guess is that the stencil test either happens after the depth test, or even after the fragment shader has been called. Why?

nVidia apparently has a patent on early stencil culling: http://www.freepatentsonline.com/7184040.html Is this the right away for having it enabled?

I'm using an nVidia GeForce GTS 450 graphics card. Is early stencil culling supposed to work with this card? Running Windows 7 with latest drivers.

genpfault
  • 51,148
  • 11
  • 85
  • 139
Fred
  • 41
  • 1
  • 2
  • If you are using early fragments optimization, (http://www.opengl.org/wiki/Early_Fragment_Test) both depth test and stencil test are executed before the fragment shader... and that can imply some limitations. – darius Sep 11 '13 at 13:07
  • 1
    @AndonM.Coleman you can very safely assume that early fragment rejection will be a win-win in my case. I am doing heavyweight, brute-force raytracing for the fragments passing the stencil test. Any idea why early stencil culling is not working in my setup? Thank you. – Fred Sep 11 '13 at 13:57

1 Answers1

1

Like early Z, early stencil is often done using hierarchical stencil buffering.

There are a number of factors that can prevent hierarchical tiling from working properly, including rendering into an FBO on older hardware. However, the biggest obstacle to getting early stencil testing working in your example is that you've left stencil writes enabled for 1/(8) bits in the second pass.

I would suggest using glStencilMask (0x00) at the beginning of the second pass to let the GPU know you are not going to write anything to the stencil buffer.

There is an interesting read on early fragment testing as it is implemented in current generation hardware here. That entire blog is well worth reading if you have the time.

Andon M. Coleman
  • 42,359
  • 2
  • 81
  • 106
  • Thanks for your suggestion. Unfortunately, that doesn't change anything. I'm currently using a PACKED_DEPTH_STENCIL texture. The code modification to have separate DEPTH/STENCIL textures is much more complex to achieve in my setup, though not absolutely impossible. Could the problem come from there, based on your experience? – Fred Sep 11 '13 at 14:29
  • What version of OpenGL are you targeting? If your driver supports OpenGL 4.2 or has the extension: ARB_shader_image_load_store, then there is a layout qualifier you can use in fragment shaders to _force_ early fragment tests: `layout (early_fragment_tests) in;`. As for using packed depth/stencil, this should not matter... the vast majority of software uses packed depth/stencil, early stencil testing would be useless if it didn't work for these use cases. – Andon M. Coleman Sep 11 '13 at 14:37
  • `layout(early_fragment_tests)` masks out occluded fragment EVEN when discard is used, eg. the fragment depth gets written. If fragment A has a depth of 0.1 but calls discard in the fragment shader, and if, later on, fragment B having Z = 0.2 is rendered at the same pixel location (and does not call discard), it will be culled away, eg. no color write will occur. layout(early_fragment_tests) enforces the Z value of A even if A calls discard. I was told this is the correct behavior - and this is not suitable in my case. EDIT: have a look here: http://tinyurl.com/nncfsd6 – Fred Sep 11 '13 at 14:46
  • Can you post your actual fragment shader? Using the `discard` instruction in a fragment shader is a sure-fire way to get OpenGL _not_ to do early fragment tests. – Andon M. Coleman Sep 11 '13 at 14:56
  • The fragment shader is just way too long to be posted here. Why would discard influence the stencil test since the stencil test happens _before_ the depth test or the rasterization? – Fred Sep 11 '13 at 14:58
  • Not a lot has been written about early stencil implementation by either NV or AMD, but I suspect since depth/stencil are tied together that they both share the same hierarchical tile data structure. I cannot speak for NV hardware, but I know prior to the Radeon HD 2000 series early stencil was done in the same stage as early z. Admittedly this is ancient hardware by today's standards, but the two things are very much related. – Andon M. Coleman Sep 11 '13 at 15:25
  • 1
    OK, I have tried without `discard`, and suddenly, it works. Why this works now is completely beyond me because this is an implementation detail, and I feel a bit sad. Thanks for your help, Andon. – Fred Sep 11 '13 at 15:29
  • And BTW, I wanted to try with separate depth/stencil to see if this could help the GPU out un-tying the Z and stencil test. Unfortunately this is a mess in my application. – Fred Sep 11 '13 at 15:35