Performance gain of glColorMask()/glDepthMask() on modern hardware?

Question

In my application I have some shaders which write only depth buffer to use it later for shadowing. Also I have some other shaders which render a fullscreen quad whose depth will not affect all subsequent draw calls, so it's depth values may be thrown away.

Assuming the application runs on modern hardware (produced 5 years ago till now), will I gain any additional performance if I disable color buffer writing (glColorMask(all to GL_FALSE)) for shadow map shaders, and depth buffer writing (with glDepthMask()) for fullscreen quad shaders?

In other words, do these functions really disable some memory operations or they just alter some mask bits which are used in fixed bitwise-operations logic in this part of rendering pipeline?

And the same question about testing. If I know beforehand that all fragments will pass depth test, will disabling depth test improve performance?

My FPS measurement don't show any significant difference, but the result may be different on another machine.

Finally, if rendering runs faster with depth/color test/write disabled, how much faster does it run? Wouldn't this performance gain be negated by gl functions call overhead?

score 5 · Accepted Answer · answered Nov 16 '17 at 15:19

Your question is missing a very important thing: you have to do something.

Every fragment has color and depth values. Even if your FS doesn't generate a value, there will still be a value there. Therefore, every fragment produced that is not discarded will write these values, so long as:

The color is routed to a color buffer via glDrawBuffers.
There is an appropriate color/depth buffer attached to the FBO.
The color/depth write mask allows it to be written.

So if you're rendering and you don't want to write one of those colors or to the depth buffer, you've got to do one of these. Changing #1 or #2 is an FBO state change, which is among the most heavyweight operations you can do in OpenGL. Therefore, your choices are to make an FBO change or to change the write mask. The latter will always be the more performance-friendly operation.

Maybe in your case, your application doesn't stress the GPU or CPU enough for such a change to matter. But in general, changing write masks are a better idea than playing with the FBO.

If I know beforehand that all fragments will pass depth test, will disabling depth test improve performance?

Are you changing other state at the same time, or is that the only state you're interested in?

One good way to look at these kinds of a priori performance questions is to look at Vulkan or D3D12 and see what it would require in that API. Changing any pipeline state there is a big deal. But changing two pieces of state is no bigger of a deal than one.

So if changing the depth test correlates with changing other state (blend modes, shaders, etc), it's probably not going to hurt any more.

At the same time, if you really care enough about performance for this sort of thing to matter, you should do application testing. And that should happen after you implement this, and across all hardware of interest. And your code should be flexible enough to easily switch from one to the other as needed.

May I ask you to elaborate on why FBO state change is "among the most heavyweight operations in OpenGL" ? — lisyarus, Nov 16 '17 at 15:24
@lisyarus: "Why" is essentially irrelevant. It is a [well understood fact of OpenGL performance development](https://www.youtube.com/watch?v=-bCeNzgiJ8I). And there's nothing you can do to change its performance characteristics. All you can do is avoid FBO changes as best your algorithm allows. — Nicol Bolas, Nov 16 '17 at 15:50

Performance gain of glColorMask()/glDepthMask() on modern hardware?

1 Answers1