29

I was reading this article, and the author writes:

Here's how to write high-performance applications on every platform in two easy steps:
[...]
Follow best practices. In the case of Android and OpenGL, this includes things like "batch draw calls", "don't use discard in fragment shaders", and so on.

I have never before heard that discard would have a bad impact on performance or such, and have been using it to avoid blending when a detailed alpha hasn't been necessary.

Could someone please explain why and when using discard might be considered a bad practise, and how discard + depthtest compares with alpha + blend?

Edit: After having received an answer on this question I did some testing by rendering a background gradient with a textured quad on top of that.

  • Using GL_DEPTH_TEST and a fragment-shader ending with the line if( gl_FragColor.a < 0.5 ){ discard; } gave about 32 fps.
  • Removing the if/discard statement from the fragment-shader increased the rendering speed to about 44 fps.
  • Using GL_BLEND with the blend function (GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA) instead of GL_DEPTH_TEST also resulted in around 44 fps.
Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
Jave
  • 31,598
  • 14
  • 77
  • 90

5 Answers5

26

It's hardware-dependent. For PowerVR hardware, and other GPUs that use tile-based rendering, using discard means that the TBR can no longer assume that every fragment drawn will become a pixel. This assumption is important because it allows the TBR to evaluate all the depths first, then only evaluate the fragment shaders for the top-most fragments. A sort of deferred rendering approach, except in hardware.

Note that you would get the same issue from turning on alpha test.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 1
    Ah, I see. So if discard is used, will it affect every tile, or just the one in which a fragment was discarded? If it only affects one tile, it should still be more efficient than using alpha/blending, correct? – Jave Dec 14 '11 at 18:50
  • 1
    What do you mean by "alpha/blending"? In any case, it will affect every tile that executes a fragment shader that uses the `discard` keyword, whether it actually calls it or not. – Nicol Bolas Dec 14 '11 at 19:03
  • With alpha/blending I simply mean using a texture with an alphamap determining the visibility of the current fragment. If I remember correctly that does not work properly unless you have GL_BLEND enabled and not GL_DEPTH_TEST (if you render an object in front of another object for example)? – Jave Dec 14 '11 at 19:25
  • I will accept your answer as it explains the main point of my question, but I'm still a bit in the unclear about how depthtest and discard compares (in performance) to using an alphamap and blending. – Jave Dec 15 '11 at 08:53
  • Does it affect performances when rendering width `GL_DEPTH_TEST` disabled ? I want to use discard in a skybox shader (the skybox is generated). – nasso Jul 23 '16 at 01:09
22

"discard" is bad for every mainstream graphics acceleration technique - IMR, TBR, TBDR. This is because visibility of a fragment (and hence depth) is only determinable after fragment processing and not during Early-Z or PowerVR's HSR (hidden surface removal) etc. The further down the graphics pipeline something gets before removal tends to indicate its effect on performance; in this case more processing of fragments + disruption of depth processing of other polygons = bad effect

If you must use discard make sure that only the tris that need it are rendered with a shader containing it and, to minimise its effect on overall rendering performance, render your objects in the order: opaque, discard, blended.

Incidentally, only PowerVR hardware determines visibility in the deferred step (hence it's the only GPU termed as "TBDR"). Other solutions may be tile-based (TBR), but are still using Early Z techniques dependent on submission order like an IMR does. TBRs and TBDRs do blending on-chip (faster, less power-hungry than going to main memory) so blending should be favoured for transparency. The usual procedure to render blended polygons correctly is to disable depth writes (but not tests) and render tris in back-to-front depth order (unless the blend operation is order-independent). Often approximate sorting is good enough. Geometry should be such that large areas of completely transparent fragments are avoided. More than one fragment still gets processed per pixel this way, but HW depth optimisation isn't interrupted like with discarded fragments.

Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
gmaclachlan
  • 690
  • 3
  • 8
  • "visibility of a fragment(and hence depth) is only determinable after fragment processing and not during Early-Z or PowerVR's HSR (hidden surface removal) etc." That's not entirely true. Either Early-Z or discard can prevent a fragment from being drawn. So it's very possible to do one and then the other. PowerVR can't for the reasons you and I stated. But traditional renderers certainly can. If they don't, it would only be because the discarding logic is bundled with the depth testing logic. That's a hardware design issue, not an algorithmic necessity. – Nicol Bolas Jan 27 '12 at 16:24
  • I missed out the case where a discard fragment is rendered 'behind' existing geometry :S - is that what you mean? Both early Z and HSR will reject fragments without fragment processing in that situation. Even then, Early-Z requires the obscuring fragments to be rendered before the obscured, discard ones or the discard shader still needs to be run for those fragments to determine depth. HSR is not dependent on submission order in this case - at the end of a frame, if discard fragments are behind opaque fragments then they don't get processed by PowerVR. – gmaclachlan Jan 30 '12 at 16:30
  • A fragment can fail to be rendered for many reasons. Depth test is one, discard is another. Early-Z simply performs the depth test first. The only reason discard would interfere with this is if the discard logic were tied into the depth test logic in the hardware. Just because something passes the depth test does not mean that it will pass everything. If the depth and discard are coupled, it is only because hardware is built that way, not because it *has* to be done that way by the algorithm. You should be able to do Early-Z tests and still later discard. – Nicol Bolas Jan 30 '12 at 16:36
  • Of course you can, but it's slower that way because depth information isn't determined until later in the pipeline. 'discard' in the shader (with the usual render state) affects the depth _write_ value of a fragment so it affects the performance of subsequent depth testing in hardware. This is what makes it different from blending. – gmaclachlan Jan 30 '12 at 16:59
  • Depth writing would happen at the same time as color writing. `discard`, like the depth test or stencil test, would affect both depth and color writing. Doing the depth *test* before the fragment shader does not *require* that the depth tested is ultimately written. Now, certain Z-*culling* techniques (Hi-Z, Hierarchial-Z, etc) do require that. So you can't use them with `discard`. But those are different from Early-Z, which is simply doing the depth test per-fragment before the fragment shader. – Nicol Bolas Jan 30 '12 at 17:03
  • There is evidence that some hardware can do depth tests without depth writes. This can be found with extensions like AMD_conservative_depth, which allows one to say how they're changing the depth value they're writing. This allows hardware to do the depth test early if it knows that you're not changing the depth in a way that will make that test invalid. That wouldn't work if the hardware wrote the depth it tested against; it still has to write the actual depth computed in the shader. So Early-Z tests are *not* linked to Early-Z writes. – Nicol Bolas Jan 30 '12 at 17:07
  • discard tri goes through pipeline, passes early-Z, Hi-Z etc., goes to fragment processing, depth is determined with colour and written out. But... if a fragment is rendered to the same x-y after this then the Early Z etc. has to wait for the discard fragment's depth to be processed and written before it can perform the depth test with up-to-date data (or potentially process obscured fragments). Hence performance hit.BTW I think I mention disabling depth write (but not tests) above. – gmaclachlan Jan 30 '12 at 17:23
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/7167/discussion-between-nicol-bolas-and-gmaclachlan) – Nicol Bolas Jan 30 '12 at 17:42
3

Also, just having an "if" statement in your fragment shader can cause a big slowdown on some hardware. (Specifically, GPUs that are heavily pipelined, or that do single instruction/multiple data, will have big performance penalties from branch statements.) So your test results might be a combination of the "if" statement and the effects that others mentioned.

(For what it's worth, testing on my Galaxy Nexus showed a huge speedup when I switched to depth-sorting my semitransparent objects and rendering them back to front, instead of rendering in random order and discarding fragments in the shader.)

Luke
  • 5,329
  • 2
  • 29
  • 34
1

Object A is in front of Object B. Object A has a shader using 'discard'. As such, I can't do 'Early-Z' properly because I need to know which sections of Object B will be visible through Object A. This means that Object A has to pass all the way through the processing pipeline until almost the last moment (until fragment processing is performed) before I can determine if Object B is actually visible or not.

This is bad for HSR and 'Early-Z' as potentially occluded objects have to sit and wait for the depth information to be updated before they can be processed. As has been stated above, its bad for everyone, or, in slightly more friendly way "Friends don't let friends use Discard".

ShriekBob
  • 199
  • 1
  • 5
-1

In your test, your if statment is in per pixel level performance

if ( gl_FragColor.a < 0.5 ){ discard; }

Would be processed once per pixel that was being rendered (pretty sure that's per pixel and not per texel)

If your if statement was testing a Uniform or Constant you'd most likley get a different result due to to Constants only being processed once on compile or uniforms being processed once per update.

Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
Cyber Axe
  • 176
  • 3
  • 7