OpenGL - poor performance and incorrect results while updating texture in a loop using FBOs

Question

First of all:
Windows XP SP3, 2GB RAM, Intel core 2 Duo 2.33 GHz, nVidia 9600GT 1GB RAM. OpenGL 3.3 fully updated.

Short description of what I am doing:
Ideally I need to put ONE single pixel in a GL texture (A) using glTexSubImage2D every frame.
Then, modify the texture inside a shader-FBO-quadfacingcamera setup and replace the original image with the resulting FBO.

Of course, I don't want a FBO Feedback Loop, so instead I put the modified version inside a temporary texture and do the update separately with glCopyTexSubImage2D.

The sequence is now:

1) Put one pixel in a GL texture (A) using glTexSubImage2D every frame (with width=height=1).
2) This modified version A is to be used/modified inside a shader-FBO-quad setup to be rendered into a different texture (B).
3) The resulting texture B is to be overwritten over A using glCopyTexSubImage2D.
4) Repeat...

By repeating this loop I want to achieve a slow fading effect by multiplying the color values in the shader by 0.99 every frame.

2 things are badly wrong:
1) with a fading factor of 0.99 repeated every frame, the fading stops at RGB 48,48,48. Thus, leaving a trail of greyish pixels not fully faded out.
2) the program runs at 100 FPS. Very bad. Because if I comment out the glCopyTexSubImage2D the program goes at 1000 FPS!!

I achieve 1000 FPS also by commenting out just glTexSubImage2D and leaving alone glCopyTexSubImage2D. This fact to clarify that glTexSubImage2D and glCopyTexSubImage2D are NOT the bottleneck by themselves (I tried to replace glCopyTexSubImage2D with a secondary FBO to do the copying, same results).

Observation: the bottleneck shows when both those commands are working!

Hard mode: no PBOs pls.

Link with source and exe:
http://www.mediafire.com/?ymu4v042a1aaha3
(CodeBlocks and SDL used)
FPS counts are written into stdout.txt

I ask for a workaround for the 2 things exposed up there.
Expected results: full fade-out effect to plain black at 800-1000 FPS.

FYI, many implementation of glTexSubImage2D are SLOW. It breaks the graphics pipeline to modify data that late in the process. I would strongly suggest using a different method to achieve your fadeout. — Michael Dorgan, Jun 07 '13 at 17:49
"*shader-FBO-quad setup*" What does that mean? Also, this question is lacking information. For example, you say you modify the texture with `glTexSubImage2D`. OK, but *how* do you do that? Because there are fast ways and slow ways, depending on the image format of the image. You talk about fading factors, but you fail to specify *how* you apply this. We need more information. — Nicol Bolas, Jun 07 '13 at 18:05
Nicol, you shall check the download link. I didn't want to make the question huge with code copypaste. The fade is applyed as: texture2D(... , ...).rgb*fadefactor in the pixel shader. — user2464424, Jun 07 '13 at 18:12

score 0 · Answer 1 · answered Jun 07 '13 at 19:59

To problem 1:

You are experiencing some precision (and quantization) issues here. I assume you are using some 8 Bit UNORM framebuffer format, so anything you write to it will be rounded the next discrete step out of 256 levels. Think about it: 48*0.99 = 47.52, which will end up as 48 again, so it will not get any darker that. Using some real floating point format would be a solution, but it is likely to greatly decrease overall performance...

The fade out operation you chose is simply not the best choice, it might be better to add some linear term to guarantee that you decrease the value by at least 1/255.

To problem 2: It is hard to say what the actual bottleneck here is. As you are not using PBOs, you are limited to synchronous texture updates.

However, why do you need to do that copy operation at all? The standard approach to this kind of things would be some texture/FBO/color buffer "ping-pong", where you just swap the "role" of the textures after each iteration. So you get the sequence:

update A
render into B (reading from A)
update B
render into A (reading from B)

For problem 1 the strange thing is that if you put GL_FLOAT everywhere and 0.0-1.0 floats as pixel colors it seems that the problem still persists... But I can easily deal with that anyway. Problem 2. Why I need the copy operation? Why not just doing ping-pong? Because well I AM doing it. A goes to B, then I take B and copy as-is to A. Guess what? Also if you use "proper" ping-pong with 2 FBOs you still get the speed drop! I used glCopyTexSubImage2D because it is somewhat faster that a double FBO switch. — user2464424, Jun 07 '13 at 20:40
@user2464424: You should be aware that your exponential decrease function will reach zero only after an infinite number of iterations (assuming unlimited precision). — derhass, Jun 08 '13 at 00:08
I am and I'm perfectly fine with it. Problem 1 is not the dig deal there, I must focus the attention on problem 2 for now. — user2464424, Jun 08 '13 at 08:54

score 0 · Accepted Answer · answered Jun 11 '13 at 21:10

Problem 2: splatting arbitrary pixels into a texture as fast as possible.
Since probably the absolute fastest way to dynamically upload data to the GPU from main memory consists in Vertex Arrays or VBOs, then the solution to problem 2 gets trivial:
1) create Vertex Array and Color Array
(or interleave coordinates and colors, performance/bandwidth may vary);
2) Z component =0. We want points to lie on the floor;
3) camera pointing downwards with orthographic projection
(being sure to match exactly the screen size with coordinate ranges);
4) render to texture with FBO using GL_POINTS w/ glPointSize=1 and GL_POINT_SMOOTH disabled.

Pretty standard. Now the program runs at 750 fps. Close enough. My dreams were all like "Hey mom look! I'm running glTexSubImage2D at 1000 fps!" and then meh.
Though glCopyTexSubImage2D is very fast. Would recommend.

Not sure if this is the best way to GPU-accelerate fadings but given the results one must note a strong concentration of Force with this one. Anyway the problem with the fading stopping half-way is fixed by setting a minimum constant decrement variable, so even if the exponential curve fails the fading will finish no matter what.

OpenGL - poor performance and incorrect results while updating texture in a loop using FBOs

2 Answers2