3

I'm working on a WebGL batch renderer (question still valid in OpenGL land). Aka all graphics in a scene drawn in the fewest possible drawArrays/drawElements calls (ideally 1). Part of this involves allowing textures to be determined based on attributes.

Therefore in my fragment shader I'm contemplating two scenarios:

1. Draw texture 0 to the screen and use attributes to determine the "frame" in which the texture lies on a sprite sheet that's in memory. The fragment shader would look something like:

precision mediump float;

uniform sampler2D u_spriteSheet;

// Represents position that's framed.
varying vec4 v_texturePosition;

void main() {
    gl_FragColor = texture2D(u_spriteSheet, v_texturePosition);
}

2. Do "if" statements in the shader to determine which uniform sampler2d to utilize. The fragment shader would look something like:

precision mediump float;

uniform sampler2D u_image1;
uniform sampler2D u_image2;
uniform sampler2D u_image3;
uniform sampler2D u_image4;
....
uniform sampler2D u_image32;

varying uint v_imageId;
// Represents texture position that's framed
varying vec4 v_texturePosition;

void main() {
    if(v_imageId == 1) {
        gl_FragColor = texture2D(u_image1, v_texturePosition);
    }
    else if (v_imageId == 2) {
        gl_FragColor = texture2D(u_image2, v_texturePosition);
    }
    ...
    else if (v_imageId == 32) {
        gl_FragColor = texture2D(u_image32, v_texturePosition);
    }
}

I understand that with option 1 i'm limited by the max texture size and by approach 2 i'm limited by the number of texture registers available. For the sake of discussion lets assume that these limits will never be passed.

I'm trying to determine the more performant approach before investing a significant amount of time into either one... Sooo any thoughts?

genpfault
  • 51,148
  • 11
  • 85
  • 139
N. Taylor Mullen
  • 18,061
  • 6
  • 49
  • 72

1 Answers1

4

If statements in shaders are generally slow, because on normal GPU hardware shaders are executed as SIMD, i.e. many fragments are processed in parallel, instruction per instruction. Simplified speaking, in case of an if all threads process the then part whereby only threads with a positive if-condition really execute and store the result while the other threads are waiting (or even do the calculation but not store the result). Afterwards all threads do the else part and all threads with positive condition are waiting.

So in your solution #2 the fragment shader on many cards would execute all 32 parts and not just one as in solution #1 (On some cards it is said that they stop executing an if branch if there is no thread following that part any more, so it may be less than 32).

So I would expect that solution #1 is faster w.r.t the fragment shader. However your solution might have other bottlenecks, so that the performance of the fragment shader might become irrelevant for the overall performance.

Additional thoughts are that many GPUs allow less than 32 textures, so you probably cannot use 32 if you want to stay compatible with many devices. According to webglstats.com 99% have 16 textures units and since most scenes have more than 16 textures there is probably no way around implementing something like your solution #1. On the other hand when you hit the maximal texture size you might need something like #2 as well.

  • Amazing insight, you're awesome! One last question, if it comes to me having to do "if" statements would that defeat the performance gains of batch rendering? – N. Taylor Mullen Nov 15 '13 at 19:58
  • ^Helmut, is there any win by putting the samplers in an array, and using an index into it? (Rather than texture2d() in the if statements.) – david van brink Nov 15 '13 at 21:58
  • @davidvanbrink I'd love to do that. I didn't think you could push an array of textures to the graphics card though. How could I do that? – N. Taylor Mullen Nov 15 '13 at 22:18
  • I cannot really tell, because I always used very few, often just one draw call and also restricted myself to two textures. f – Helmut Emmelmann Nov 15 '13 at 22:23
  • My desktop GPU can however easily do much more complex pixel shaders doing a couple of texture accesses and ifs (I use that for shadows), so here I guess reducing the number of draw calls at the expense of a more complex fragment shader would be beneficial for many scenes. On the other hand for my old laptop from 2006 16 ifs and texture accesses would be too much to even compile the shader because of code size, but it is the question what devices you want to support. – Helmut Emmelmann Nov 15 '13 at 22:43
  • 1
    @davidvanbrink Very unfortunately in WebGL fragment shaders the array index expression cannot be a dynamic expression (e.g. the value of a varying) [link](http://stackoverflow.com/questions/17093477/webgl-how-to-bind-an-array-of-samplers) – Helmut Emmelmann Nov 16 '13 at 01:10