3

I'm making a 2d game involving drawing huge numbers of overlapping quads to the screen. What goes in front of what doesn't really matter.

If I draw each of my quads with z values from 0 upwards and have glDepthFunc(GL_LESS) set I get quite a nice speed boost as you would expect. This is to avoid having to draw quads which are either totally hidden or partially hidden behind other quads. So I draw the quads using something like:

float small = (float(1)/1000000);
for (int iii = 0; iii < 100000; iii++) {
    freeSpace = bullets[iii]->draw(opengl, freeSpace, iii*small);
}

However as I don't use the z value for actual depth it seems like I should be able to just go:

for (int iii = 0; iii < 100000; iii++) {
    freeSpace = bullets[iii]->draw(opengl, freeSpace, 0.0f);
}

Or just code the z value of 0.0f into the shader. (the 3rd argument is the z value and ends up being set to gl_position in the shader unchanged.)

The strange thing is that the second method (where I set the z value to 0.0f everytime), ends up getting almost less than half the framerate of the former.

Why is this? They both use glDepthFunc(GL_LESS) and

glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);
glDrawArrays(GL_TRIANGLES, 0, 100000*(2*3));

Just the same. I would think that if any setting the z to 0.0f each time would be faster. Why is it not?

genpfault
  • 51,148
  • 11
  • 85
  • 139
Ellipsis
  • 63
  • 6
  • what value do you clear the depth buffer to? – Roger Allen Feb 19 '13 at 04:07
  • No, but it is less than 1 (which is what the depth buffer is cleared to). So the idea is that each pixel only gets written to once. I can use GL_LEQUAL, it still has terrible performance (amount the same as GL_LESS). My aim is to try to have the depth buffer only let each pixel be drawn to once so as to reduce the fill rate requirement. – Ellipsis Feb 19 '13 at 04:08
  • The depth buffer clears to the default value (I never change it). So it will clear to 1.0f I believe. – Ellipsis Feb 19 '13 at 04:33

1 Answers1

3

I'm not positive, but my speculation is that the small delta in z values between primitives allows the zcull hardware to work. This will cull out the fragments before they get to the fragment shader. Besides avoiding the fragment shader work, this culling can happen at a faster rate than normal z-testing when the fragment makes it to the depth buffer test.

Roger Allen
  • 2,262
  • 17
  • 29
  • Ah, thank you. That sounds possible. I'm discarding in the fragment shader so I believe early zcull is not possible. However normal culling probably does take place. It is strange that it does not happen when the points have the and z value? How could I work around this? My current method seems convoluted. Should I be using the stencil buffer instead or have the vertex shader generate the z values somehow? Anyway, thanks for the reply. – Ellipsis Feb 19 '13 at 04:28
  • Oh yes, if you are killing off fragments in the fragment shader, the zcull hardware cannot be used and this also can impact other z-testing performance improvements. If you can avoid doing that fragment shader kill you might find a significant perf improvement. – Roger Allen Feb 19 '13 at 04:44
  • I have to draw circles to the screen, because of this it seems unavoidable. I'm drawing circles textured onto quads and ether I have to turn blending on which causes slowdown or discard all pixels with an alpha value of 0 in the fragment shader. Is there another (better) way? Am I right in thinking that some kind of zculling is still going on even though I'm discarding in the fragment shader? – Ellipsis Feb 19 '13 at 04:58
  • I think the basic answer is -- it's complicated, so there isn't a simple answer. You're exploring parts of the pipeline that different chips will accelerate and optimize differently. It sounds like you generally have a good idea of what should be happening in the graphics pipe. I'd suggest just trying out your ideas and exploring the design space to see what works best on your card. – Roger Allen Feb 20 '13 at 04:32