Issues with subsequent texture read and write

Question

In my rendering pipeline (OpenGL 3.3 core), I have a following cycle (pseudo-code):

for 1..n:
  render to texture T
  bind texture T
  render to back buffer (with texture T sampled in the fragment shader)

When n=1, everything is OK. However, when n=2, the first render is not correct.

I suspect render to texture T to begin the writing to T before previous render to back buffer has finished the sampling of T.

After putting glFlush() to the end of the cycle, the rendering is correct, however, FPS drops a little. Everywhere on the Internet I keep finding "If you need to use glFlush() you are probably doing something wrong".

Did i identify the problem correctly? Is glFlush() a correct solution in this case? Would using a different texture for each iteration (I know n) be better solution?

It happens on GTX580 but not on ATI Mobility Radeon 3470.

My context - details:

I have 2 lights and g-buffer (FBO). In each iteration I am doing a deferred shading with one light and its shadow map (another FBO with texture T - mutual for all lights), where n is a number of lights. In render to back buffer I accumulate the light.

In the following image, I did not accumulate the light. Instead I rendered the first iteration to the left viewport and the second to the right one to demonstrate the problem.

I'm not sure I follow. What is your motivation for using the same shadow map texture for all lights? You should consider using enough separate shadow map textures for every light that is allowed to cast a shadow in your scene. You can amortize updates this way (e.g. update shadow maps for distant lights less frequently or update half of the maps per-frame) and do things like time how long shadow maps are taking to build in an entire frame using a timer query if you dedicate a separate stage to building shadow maps before lighting instead of during. — Andon M. Coleman, Aug 13 '15 at 22:46
Well, I just did not realize this asynchronicity can even happen, so using another texture seemed like memory wasting to me. In my renderer I create shadow maps which I then assign to lights - so multiple lights can share a shadow map but they don't have to (so I can have shadow maps with diferrent resolutions, frequencies of updates, etc). In my current setting I have only one shadow map which is shared by two lights. I could create another shadow map, but it would have exactly the same properties. — Lukas Z., Aug 13 '15 at 23:15
I have slightly edited the original question, to make it clear. Maybe the "recycling" still misses the point (it is not so much memory). Also, if I use another texture for each light but bind it to the same texture unit for the shading pass, can't I run to similar issue? I.e. binding new texture to a texture unit which is still in use? — Lukas Z., Aug 13 '15 at 23:34
Okay, I still don't understand how you are reading and writing to the shadow map in the same pass though? Those other things aside, it doesn't make sense to me. You write the shadow map on the 2nd line of this block of code, and then presumably drawFullScreenQuad is where you do the lighting and _read_ the shadow map. At what point are you ever reading and writing in the same operation? If you've got glitches, this is not the source unless your driver is screwed up. Traditional texture sampling is guaranteed coherent. — Andon M. Coleman, Aug 14 '15 at 00:05
Thank you for your effort! I came to this conclusion, because there are no glitches when I disable the shadow mapping or use only one light (so there is only one iteration of the for-cycle). If I call glFlush() after the drawFullScreeQuad (where the shadow map is read) the glitches disappear. I believe that before the deferred shading on GPU side in drawFullScreenQuad() is finished the CPU gets to renderObjectsToShadowMap() and starts overwriting it. I may be wrong though. The glitches are appearing always in the same ~1/15 of screen and are different in every frame (even in a static scene). — Lukas Z., Aug 14 '15 at 08:53
Can you show an overview of your FBO setup in the question? From the current overview, I cannot see any read + write scenario. But if you're using the same FBO for all of these operations with the shadow map attached for depth, that could be an issue. — Andon M. Coleman, Aug 14 '15 at 10:07
OK, I added the details about the FBOs to the original question. There are two passes in the cycle - shadow map creation (writing to texture _x_) and shading the g-buffer (reading the g-buffer textures as well as _x_). If everything was synchronized, there would be no read/write collision, however it seems to me, that when it comes to the second call of `renderObjectsToShadowMap()` the `drawFullScreenQuad()` did not finish yet. Dunno if it is possible, but I can not find any other reasoning. The glitches are random. — Lukas Z., Aug 14 '15 at 10:59
You are right, the artifacts displayed here are indicative of a feedback loop on NV hardware. This problem pops up a lot and always looks like that for some reason on NV hardware (AMD, when it encounters this problem, does not have a regular pattern)... what happened to the code you linked to yesterday? I was going to take a closer look at that when I had some free time. — Andon M. Coleman, Aug 15 '15 at 19:44
The code is here: http://pastebin.com/edit.php?i=fPQ1M7z7 However, my implementation is a bit complex and the code is spread over many classes. There is a lot of abstraction. This is from "abstract renderer" part - there is no single OpenGL call (except for glFlush() commented out), so I do not expect it to be more useful than the pseudo-code in the question. Especially I do not expect anybody to want to read my all code. (...) In the docs I found, there is an implicit synchronization mentioned when reading from a texture that was attached previously, though nothing about the other direction. — Lukas Z., Aug 15 '15 at 19:58
I am almost into answering the question myself (if nobody else comes with the explanation). Where I am stuck now is why glFlush() is enaugh (no need for glFinish() which makes sense to me). — Lukas Z., Aug 15 '15 at 20:38
Flushing the pipeline immediately after each iteration would prevent the driver from re-ordering commands that it thinks have no complex interaction. GL requires memory coherence in the traditional pipeline (that's what implicit synchronization partially does), drivers spend a lot of their time looking for clever opportunities to do some commands in parallel rather than serial. If it gets that stuff wrong, these are the kinds of results you start to see. An appropriate memory barrier would probably have the same affect as `glFlush (...)` if this is the problem. — Andon M. Coleman, Aug 15 '15 at 21:05
Ultimately, I think re-using the same depth texture over and over is the most likely reason the driver's confused here. I've suspected this from the very start. But if you say that changing the render pattern to work the way I discussed in my first comment doesn't change anything, then I think we can rule that out. — Andon M. Coleman, Aug 15 '15 at 21:10
Thank you! Now I have everything answered. If you post an answer with all the info here in comments, I will definitely accept it. Honestly I did not try using different textures yet, but I do also believe that it would solve it and would be more efficient than this texture recycling with glFlush(). I just really wanted to understand, what is happening there. Reordering commands, I did not know, OpenGL is allowed to do this. Thank you! — Lukas Z., Aug 15 '15 at 21:51

Issues with subsequent texture read and write

0 Answers0