In an WebGL application I've got to draw a bunch of quads (the more the better, but 1000 should be a reasonable upper bound). Each quad has some attributes, like color, position, size, perhaps a few material properties. Something in the order of 10 floats per quad. The shaders will do fancy stuff with these, but that's irrelevant here. Each vertex could be described as (position + size×(±1,±1,0)). Now I wonder how best to render all of these quads. There are essentially three options that I see:
Use uniforms for all the parameters, and then call
gl.drawArrays
once for each quad, with an array buffer which simply contains relative coordinates for the corners, i.e. vectors of the form (±1,±1). That would mean a triangle strip of four vertices forming two triangles.Use a single
gl.drawArrays
call for all quads together. Since attributes are per vertex, not per triangle, this would mean replicating all the parameters for all the vertices. Furthermore, since I can't have a single triangle strip through all vertices, I'd have to duplicate vertices, so I'd essentially have 6 vertices per quad and might as well use distinct triangles instead of triangle strips. Which means about 6×(10+2)=72 floats for every quad, with a lot of redundancy in there.Like 2. but use
gl.drawElements
to avoid duplicating vertices for the two triangles forming each quad. So I'd end up with 4×(10+2)=48 floats as attributes and 6 ints for indices.
I'm unsure which approach to take. None of them feels completely adequate. In 1. I get the impression that drawing an array of only four vertices per call might be wasting performance. I'm not sure if several quads drawn using this approach could be rendered in parallel. With 2. and 3. I'm worried about the high amount of data redundancy, and the buffer sizes required to hold the arrays. 3. Reduces the data volume somewhat, but might involve additional overhead due to indirection.
I know, in terms of performance the ultimate answer is to perform benchmarks. But I'd like to know whether there is some established best practice here, which does not only consider performance on my one development machine, but across a wide variety of hardware, drivers, browsers, and also considers other aspects like scalability of the memory requirements. That's the reason I'm asking this question while I'm still working towards implementations suitable for real life comparisons.