I'm afraid, I have to say that the Vulkan Tutorial is wrong. In its current state, it can not be guaranteed that there are no memory hazards when using only one single depth buffer. However, it would require only a very small change so that only one depth buffer would be sufficient.
Let's analyze the relevant steps of the code that are performed within drawFrame
.
We have two different queues: presentQueue
and graphicsQueue
, and MAX_FRAMES_IN_FLIGHT
concurrent frames. I refer to the "in flight index" with cf
(which stands for currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT
). I am using sem1
and sem2
to represent the different arrays of semaphores and fence
for the array of fences.
The relevant steps in pseudocode are the following:
vkWaitForFences(..., fence[cf], ...);
vkAcquireNextImageKHR(..., /* signal when done: */ sem1[cf], ...);
vkResetFences(..., fence[cf]);
vkQueueSubmit(graphicsQueue, ...
/* wait for: */ sem1[cf], /* wait stage: *, COLOR_ATTACHMENT_OUTPUT ...
vkCmdBeginRenderPass(cb[cf], ...);
Subpass Dependency between EXTERNAL -> 0:
srcStages = COLOR_ATTACHMENT_OUTPUT,
srcAccess = 0,
dstStages = COLOR_ATTACHMENT_OUTPUT,
dstAccess = COLOR_ATTACHMENT_WRITE
...
vkCmdDrawIndexed(cb[cf], ...);
(Implicit!) Subpass Dependency between 0 -> EXTERNAL:
srcStages = ALL_COMMANDS,
srcAccess = COLOR_ATTACHMENT_WRITE|DEPTH_STENCIL_WRITE,
dstStages = BOTTOM_OF_PIPE,
dstAccess = 0
vkCmdEndRenderPass(cb[cf]);
/* signal when done: */ sem2[cf], ...
/* signal when done: */ fence[cf]
);
vkQueuePresent(presentQueue, ... /* wait for: */ sem2[cf], ...);
The draw calls are performed on one single queue: the graphicsQueue
. We must check if commands on that graphicsQueue
could theoretically overlap.
Let us consider the events that are happening on the graphicsQueue
in chronological order for the first two frames:
img[0] -> sem1[0] signal -> t|...|ef|fs|lf|co|b -> sem2[0] signal, fence[0] signal
img[1] -> sem1[1] signal -> t|...|ef|fs|lf|co|b -> sem2[1] signal, fence[1] signal
where t|...|ef|fs|lf|co|b
stands for the different pipeline stages, a draw call passes through:
t
... TOP_OF_PIPE
ef
... EARLY_FRAGMENT_TESTS
fs
... FRAGMENT_SHADER
lf
... LATE_FRAGMENT_TESTS
co
... COLOR_ATTACHMENT_OUTPUT
b
... BOTTOM_OF_PIPE
While there might be an implicit dependency between sem2[i] signal -> present
and sem1[i+1]
, this only applies when the swap chain provides only one image (or if it would always provide the same image). In the general case, this can not be assumed. That means, there is nothing which would delay the immediate progression of the subsequent frame after the first frame is handed over to present
. The fences also do not help because after fence[i] signal
, the code waits on fence[i+1]
, i.e. that also does not prevent progression of subsequent frames in the general case.
What I mean by all of that: The second frame starts rendering concurrently to the first frame and there is nothing that prevents it from accessing the depth buffer concurrently as far as I can tell.
The Fix:
If we wanted to use only a single depth buffer, though, we can fix the tutorial's code: What we want to achieve is that the ef
and lf
stages wait for the previous draw call to complete before resuming. I.e. we want to create the following scenario:
img[0] -> sem1[0] signal -> t|...|ef|fs|lf|co|b -> sem2[0] signal, fence[0] signal
img[1] -> sem1[1] signal -> t|...|________|ef|fs|lf|co|b -> sem2[1] signal, fence[1] signal
where _
indicates a wait operation.
In order to achieve this, we would have to add a barrier that prevents subsequent frames performing the EARLY_FRAGMENT_TEST
and LATE_FRAGMENT_TEST
stages at the same time. There is only one queue where the draw calls are performed, so only the commands in the graphicsQueue
require a barrier. The "barrier" can be established by using the subpass dependencies:
vkWaitForFences(..., fence[cf], ...);
vkAcquireNextImageKHR(..., /* signal when done: */ sem1[cf], ...);
vkResetFences(..., fence[cf]);
vkQueueSubmit(graphicsQueue, ...
/* wait for: */ sem1[cf], /* wait stage: *, EARLY_FRAGMENT_TEST...
vkCmdBeginRenderPass(cb[cf], ...);
Subpass Dependency between EXTERNAL -> 0:
srcStages = EARLY_FRAGMENT_TEST|LATE_FRAGMENT_TEST,
srcAccess = DEPTH_STENCIL_ATTACHMENT_WRITE,
dstStages = EARLY_FRAGMENT_TEST|LATE_FRAGMENT_TEST,
dstAccess = DEPTH_STENCIL_ATTACHMENT_WRITE|DEPTH_STENCIL_ATTACHMENT_READ
...
vkCmdDrawIndexed(cb[cf], ...);
(Implicit!) Subpass Dependency between 0 -> EXTERNAL:
srcStages = ALL_COMMANDS,
srcAccess = COLOR_ATTACHMENT_WRITE|DEPTH_STENCIL_WRITE,
dstStages = BOTTOM_OF_PIPE,
dstAccess = 0
vkCmdEndRenderPass(cb[cf]);
/* signal when done: */ sem2[cf], ...
/* signal when done: */ fence[cf]
);
vkQueuePresent(presentQueue, ... /* wait for: */ sem2[cf], ...);
This should establish a proper barrier on the graphicsQueue
between the draw calls of the different frames. Because it is an EXTERNAL -> 0
-type subpass dependency, we can be sure that renderpass-external commands are synchronized (i.e. sync with the previous frame).
Update: Also the wait stage for sem1[cf]
has to be changed from COLOR_ATTACHMENT_OUTPUT
to EARLY_FRAGMENT_TEST
. This is because layout transitions happen at vkCmdBeginRenderPass
time: after the first synchronization scope (srcStages
and srcAccess
) and before the second synchronization scope (dstStages
and dstAccess
). Therefore, the swapchain image must be available there already so that the layout transition happens at the right point in time.