OpenGL ES render performance

Question

I have a simple question concerning the render performance under OpenGL ES.

Lets assume i am rendering a simple 2D particle system, with lets say 1000 particles, on a mobile device like an iPhone or Samsung Galaxy S.

All particles are rendered from the same textures. Particles get scaled and rotated during their lifecycle. We are talking about OpenGL ES here.

What is the more practicable way:

1) Setup a batch of vertices and transform each particle into it ( using the CPU to do the required transformation) then do 1 single call to glDrawArrays to draw all particles at once.

2) Draw each single particle using (pseudo!) code like this:

glPushMatrix();         
glColor4f(_act_color.r, _act_color.g, _act_color.b, _act_color.a);  
glTranslatef(_pos.x, _pos.y, 0.0f);
glRotatef(_rot, 0, 0, 1);
glVertexPointer(2, GL_FLOAT, sizeof(vertexVT), &verBuf[0].v[0]);
glTexCoordPointer(2, GL_FLOAT, sizeof(vertexVT), &verBuf[0].t[0]);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
glPopMatrix();

Which way is better. When choosing the first way, it requires more CPU power, but it should behave the same on all devices. One withdraw of the first way will be that I get some vertex overhead because I have to use "degenerated" vertices between every particle.

Second way does transformation in HW but will all the Open GL commandos behave the same way on different platforms?

What is your opinion to each implementation? I would like to show up the pros and contras of each way.

possible duplicate of [How do I draw 1000+ particles (w/ unique rotation, scale, and alpha) in iPhone OpenGL ES particle system without slowing down the game?](http://stackoverflow.com/questions/7435123/how-do-i-draw-1000-particles-w-unique-rotation-scale-and-alpha-in-iphone-o) — Brad Larson, Oct 27 '11 at 16:41

score 3 · Answer 1 · answered Oct 27 '11 at 12:49

3

Which way is better.

Neither. OpenGL matrix manipulation happens on the CPU as well. Every matrix-matrix multiplication -- that is what glRotate, glTranslate, glScale do -- require 64 multiplications and 16 additions, eating away CPU cycles just the same.

What you actually should do is instancing. See this article for a detailed explanation: http://nukecode.blogspot.com/2011/07/geometry-instancig-for-iphone-wip.html

answered Oct 27 '11 at 12:49

datenwolf

159,371
13
185
298

Thank you for the link. I will look into it. Unfortunatley I am sticking with OpenGL ES 1.1 right now. But this might change soon. – NULL Oct 27 '11 at 16:50

score 2 · Accepted Answer · answered Oct 27 '11 at 14:23

Actually, the first way will be faster, since doing a separate draw call for every individual quad is very expensive. It also means that you don't have to send a new matrix to the GPU for every quad, which saves time. And combining a translation and rotation matrix doesn't require a full 4x4 matrix multiplication, you can take some shortcuts there.

If you're going to do it this way, just create a single VBO (using GL_DYNAMIC_DRAW because the data will change every frame), into which you can copy the computed vertex data. And if you can live without the rotation, you could look into point sprites for doing the particles.

I will definitly have a look at some point sprite sample code. Here is an article on stackoverflow: http://stackoverflow.com/questions/4180401/how-to-specify-point-sprite-texture-coordinates-in-opengl-es-1-1 — NULL, Oct 27 '11 at 16:57

OpenGL ES render performance

2 Answers2