Rendering independent quads using one draw call per quad or one call for all of them?

Question

In an WebGL application I've got to draw a bunch of quads (the more the better, but 1000 should be a reasonable upper bound). Each quad has some attributes, like color, position, size, perhaps a few material properties. Something in the order of 10 floats per quad. The shaders will do fancy stuff with these, but that's irrelevant here. Each vertex could be described as (position + size×(±1,±1,0)). Now I wonder how best to render all of these quads. There are essentially three options that I see:

Use uniforms for all the parameters, and then call gl.drawArrays once for each quad, with an array buffer which simply contains relative coordinates for the corners, i.e. vectors of the form (±1,±1). That would mean a triangle strip of four vertices forming two triangles.
Use a single gl.drawArrays call for all quads together. Since attributes are per vertex, not per triangle, this would mean replicating all the parameters for all the vertices. Furthermore, since I can't have a single triangle strip through all vertices, I'd have to duplicate vertices, so I'd essentially have 6 vertices per quad and might as well use distinct triangles instead of triangle strips. Which means about 6×(10+2)=72 floats for every quad, with a lot of redundancy in there.
Like 2. but use gl.drawElements to avoid duplicating vertices for the two triangles forming each quad. So I'd end up with 4×(10+2)=48 floats as attributes and 6 ints for indices.

I'm unsure which approach to take. None of them feels completely adequate. In 1. I get the impression that drawing an array of only four vertices per call might be wasting performance. I'm not sure if several quads drawn using this approach could be rendered in parallel. With 2. and 3. I'm worried about the high amount of data redundancy, and the buffer sizes required to hold the arrays. 3. Reduces the data volume somewhat, but might involve additional overhead due to indirection.

I know, in terms of performance the ultimate answer is to perform benchmarks. But I'd like to know whether there is some established best practice here, which does not only consider performance on my one development machine, but across a wide variety of hardware, drivers, browsers, and also considers other aspects like scalability of the memory requirements. That's the reason I'm asking this question while I'm still working towards implementations suitable for real life comparisons.

Brendan Annable · Accepted Answer · 2015-02-09T13:32:59.700

Reducing draw calls is generally the first thing to do when improving performance, so that takes 1. out straight away, and will only get worse with more quads.

I don't see the advantage of using 2 over 3, so I'd go with 3. Remember you can always use degenerate triangles to use a triangle strip over discontinuous objects, in your case quads.

If you want to reduce redundancy, instead of using attributes, consider using textures as a lookup, and encode your texture such that you can simply lookup the color. I'm unsure if this would be any faster, but it is an option.

score 0 · Answer 2 · answered Feb 05 '15 at 15:52

I did some experiments, using Firefox on Linux and Mac. For some reason I couldn't get propert frame counters running, but from responsiveness to mouse interaction method 2 feels clearly superior to 1. The distinction is greater on the linux machine, which has an older nouveau-driven graphics card. So unless someone else posts an answer to the contrary, I'll use this as guidance and go with 2 for now. Perhaps switching to 3 if I ever encounter a scenario where memory becomes an issue.

Rendering independent quads using one draw call per quad or one call for all of them?

2 Answers2