I have to optimize a stencil-based layer rendering system because it produces too many drawcalls (hundreds). The scenario is as this:
- Several dozens geometry batches are drawn. Each batch has its own drawcall.
- Each batch has multiple layers (about 10). These layers are to be sorted globally. Currently, this is done via glStencilFunc by assigning different ref values for every layer.
- Since one drawcall cannot use multiple glStencilFunc configurations as far as I know, this yields numberOfBatches * numberOfLayers drawcalls.
- There is other geometry in the scene, so I cannot disable depth test/depth read but depth write is disables while the batches are rendered.
Now my question is: Do you know an approach for layered sorting without glStencilFunc so I only have to issue one render call per batch?
My first idea would be a two-pass approach with an FBO. In the first pass, only the batches are rendered and the depth value is used for layer sorting because I can write it from the vertex/fragment shader. In the second pass, said FBO is sampled in the fragment shader and only the fragments with the surviving depth value are preserved. This would replace one pass with hundreds of drawcalls by two passes with dozens of drawcalls. The question is, whether the saving in drawcalls will make up for the performance loss due to render target switching.
Or do you have a better idea?
The approach should work with OpenGL ES 3.0 (maybe also 3.1).