I am writing a 2D game in openGL and I ran into some performance problems while rendering a couple of textures covering the whole window.
What I do is actually create a texture with the size of the screen, render my scene onto that texture using FBO and then render the texture a couple of times with a different offsets to get a kind of "shadow" going. But when I do that I get a massive performance drop while using my integrated video card.
So all in all I render 7 quads onto the whole screen(background image, 5 "shadow images" with a black "tint" and the same texture with its true colors). I am using RGBA textures which are 1024x1024 in size and fit them in a 900x700 window. I am getting 200 FPS when I am not rendering the textures and 34 FPS when I do (in both scenarios I actually create the texture and render the scene onto it). I find this quite odd because I am only rendering 7 quads essentially. A strange thing is also that when I run a CPU profiler it doesn't suggest that this is the bottleneck(I know that opengl uses a pipeline architecture and this thing can happen but most of the times it doesn't).
When I use my external video card I get consistent 200 FPS when I do the tests above. But when I disable the scene rendering onto the texture and disable the texture rendering onto the screen I get ~1000 FPS. This happens only to my external video card - when I disable the FBO using the integrated one I get the same 200 FPS. This really confuses me.
Can anyone explain what's going on and if the above numbers sound right?
Integrated video card - Intel HD Graphics 4000
External video card - NVIDIA GeForce GTX 660M
P.S. I am writing my game in C# - so I use OpenTK if that is of any help.
Edit:
First of all thanks for all of the responses - they were all very helpful in a way, but unfortunately I think there is just a little bit more to it than just "simplify/optimize your code". Let me share some of my rendering code:
//fields defined when the program is initialized
Rectangle viewport;
//Texture with the size of the viewport
Texture fboTexture;
FBO fbo;
//called every frame
public void Render()
{
//bind the texture to the fbo
GL.BindFramebuffer(FramebufferTarget.Framebuffer, fbo.handle);
GL.FramebufferTexture2D(FramebufferTarget.Framebuffer, fboTexture,
TextureTarget.Texture2D, texture.TextureID, level: 0);
//Begin rendering in Ortho 2D space
GL.MatrixMode(MatrixMode.Projection);
GL.PushMatrix();
GL.LoadIdentity();
GL.Ortho(viewport.Left, viewport.Right, viewport.Top, viewport.Bottom, -1.0, 1.0);
GL.MatrixMode(MatrixMode.Modelview);
GL.PushMatrix();
GL.LoadIdentity();
GL.PushAttrib(AttribMask.ViewportBit);
GL.Viewport(viewport);
//Render the scene - this is really simple I render some quads using shaders
RenderScene();
//Back to Perspective
GL.PopAttrib(); // pop viewport
GL.MatrixMode(MatrixMode.Projection);
GL.PopMatrix();
GL.MatrixMode(MatrixMode.Modelview);
GL.PopMatrix();
//Detach the texture
GL.FramebufferTexture2D(FramebufferTarget.Framebuffer, fboTexture, 0,
0, level: 0);
//Unbind the fbo
GL.BindFramebuffer(FramebufferTarget.Framebuffer, 0);
GL.PushMatrix();
GL.Color4(Color.Black.WithAlpha(128)); //Sets the color to (0,0,0,128) in a RGBA format
for (int i = 0; i < 5; i++)
{
GL.Translate(-1, -1, 0);
//Simple Draw method which binds the texture and draws a quad at (0;0) with
//its size
fboTexture.Draw();
}
GL.PopMatrix();
GL.Color4(Color.White);
fboTexture.Draw();
}
So I don't think there is actually anything wrong with the fbo and rendering onto the texture, because this is not causing the program to slow down on both of my cards. Previously I was initializing the fbo every frame and that might have been the reason for my Nvidia card to slow down, but now when I am pre-initializing everything I get the same FPS both with and without fbo.
I think the problem is not with the textures in general because if I disable textures and just render the untextured quads I get the same result. And still I think that my integrated card should run faster than 40 FPS when rendering only 7 quads on the screen, even if they cover the whole of it.
Can you give me some tips on how can I actually profile this and post back the result? That would be really useful.
Edit 2:
Ok I experimented a bit and managed to get much better performance. First I tried rendering the final quads with a shader - this didn't have any impact on performance as I expected.
Then I tried to run a profiler. But I far as I know SlimTune is just a CPU profiler and it didn't give me the results I wanted. Then I tried gDEBugger. It has an integration with visual studio which I later found out that it does not support .NET projects. I tried running the external version but it didn't seem to work (but maybe I just haven't played with it enough).
The thing that really did the trick was that rather than rendering the 7 quads directly to the screen I first render them on a texture, again using fbo, and then render the final texture once onto the screen. This got my fps from 40 to 120. Again this seem kind of curios to say the least. Why is rendering to a texture way faster than directly rendering to the screen? Nevertheless thanks for the help everyone - it seems that I have fixed my problem. I would really appreciate if someone come up with reasonable explanation of the situation.