I made an in-game graphical profiler (CPU and GPU) and there is one strange behavior with the Nvidia driver that I'm not sure how to handle.
Here is a screenshot of what a normal case looks like:
What you can see here is 3 consecutive frames, GPU at the top, CPU at the bottom. Both graphs are synchronized.
The "END FRAME" bar only contains the call to SwapBuffers
. It can seem weird that it's blocking until the GPU has done all its work, but that's what the driver chooses to do sometimes when vsync is ON and that all the work (CPU and GPU) can fit in 16ms (AMD does the same). My guess is that it does it to minimize inputs lag.
Now my problem is that it does not always do that. Depending on what happens in the frame, the graph sometimes looks like this:
What actually happens here, is that the first OpenGL call is blocking, instead of the call to
SwapBuffers
. In this particular case, the blocking call is glBufferData
. It's much more visible if I add a dummy code that does just that (create a uniform buffer, load it with random values and destroy it):
This is a problem because it means a bar in the graph may get very big for no apparent reason. People seeing that will probably draw an incorrect conclusion about some code being slow.
So my question is, how can I handle this case? I need a way to display meaningful CPU timings at all time.
Adding a dummy code that loads a uniform buffer is not very elegant and may not work for future version of the driver (what if the driver only blocks on drawcalls instead?).
Synchronizing with a glClientWaitSync
does not look like a good thing to do either, because if the frame rate drops, the driver will stop blocking to allow the CPU and GPU frames to be run in parallel and I need to detect that to stop calling glClientWaitSync
(but I'm not sure how to do that.)
(Suggestions for a better title are welcome.)
Edit: here is what happens without vsync, when the GPU is the bottleneck:
The GPU frame takes longer than the CPU frame so the driver decided to block the CPU during
glBufferData
until the GPU has caught up.
The conditions are not the same, but the problem is: the CPU timings are "wrong" because the driver make some of the OpenGL function block. That may actually be a simpler example to understand than the one with vsync on.