I am working on an OpenGL 3D renderer. I am using Java and the JOGL API. Right now I am trying to performance tweak my deferred renderer. For this, I am using the VisualVM Profiler, in order to see what's eating up my precious CPU cycles, and also gDEBugger to track my OpenGL function calls. Right now, it's dropping under 30 FPS even when only 20-30 lights are present (and a very small number of actual meshes that need to be lit). And I haven't even added things like shadow mapping and normal mapping to it yet.
This is what it looks like (41 lights but only 20 FPS) at the moment:
In VisualVM, I have noticed that most of the CPU time is spent in the native jogamp.opengl.windows.wgl.WGLUtil.SwapBuffers
method. However, I am unsure whether this information is relevant or in any way related to the running times of my shaders. The run times of the glDrawArrays
calls are negligible, which confirms the fact that this method just issues the calls to the GPU, and obviously doesn't wait for them to finish before returning. Why is most of my application's time spent in that function call?
What's also rather strange is this: when I have 3-4 lights, I run at 60FPS. When I increase that number to 20, the number drops to 20FPS. When I add 1000-1100 lights, I still get about 10 FPS. Why is that? Since the light computations are additive, there should be a lot more fragment shaders running per pixel when there are 1000 lights than when there are 20, meaning the renderer should, in theory, run much slower than with 20 lights. Why this huge performance hit when jumping from 3-4 lights to 20 lights?
And now for part II of my question: Like I said before, I have also started using gDEBugger for profiling my OpenGL calls, but, sadly, I can't seem to find a way to determine exactly what takes the longest time to execute - I only get info regarding which functions get called the most. It helps when trying to hunt down things like redundant state changes, but for the stage I'm in, I'm sure that there are bigger issues slowing my renderer down.
Does anyone know of a reliable way to track the execution times of my fragment shaders, and see exactly which computations in which fragments take the longest? I know that nVidia nSight provides some wonderful graphics debugging capabilities, but most of them are only available for DirectX applications.