0

I have experience with D3D11 and want to learn D3D12. I am reading the official D3D12 multithread example and don't understand why the shadow map (generated in the first pass as a DSV, consumed in the second pass as SRV) is created for each frame (actually only 2 copies, as the FrameResource is reused every 2 frames).

The code that creates the shadow map resource is here, in the FrameResource class, instances of which is created here.

There is actually another resource that is created for each frame, the constant buffer. I kind of understand the constant buffer. Because it is written by CPU (D3D11 dynamic usage) and need to remain unchanged until the GPU finish using it, so there need to be 2 copies. However, I don't understand why the shadow map needs to do the same, because it is only modified by GPU (D3D11 default usage), and there are fence commands to separate reading and writing to that texture anyway. As long as the GPU follows the fence, a single texture should be enough for the GPU to work correctly. Where am I wrong?

Thanks in advance.

EDIT

According to the comment below, the "fence" I mentioned above should more accurately be called "resource barrier".

1 Answers1

2

The key issue is that you don't want to stall the GPU for best performance. Double-buffering is a minimal requirement, but typically triple-buffering is better for smoothing out frame-to-frame rendering spikes, etc.

FWIW, the default behavior of DXGI Present is to stall only after you have submitted THREE frames of work, not two.

Of course, there's a trade-off between triple-buffering and input responsiveness, but if you are maintaining 60 Hz or better than it's likely not noticeable.

With all that said, typically you don't need to double-buffered depth/stencil buffers for rendering, although if you wanted to make the initial write of the depth-buffer overlap with the read of the previous depth-buffer passes then you would want distinct buffers per frame for performance and correctness.

The 'writes' are all complete before the 'reads' in DX12 because of the injection of the 'Resource Barrier' into the command-list:

void FrameResource::SwapBarriers()
{
    // Transition the shadow map from writeable to readable.
    m_commandLists[CommandListMid]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_DEPTH_WRITE, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE));
}

void FrameResource::Finish()
{
    m_commandLists[CommandListPost]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_DEPTH_WRITE));
}

Note that this sample is a port/rewrite of the older legacy DirectX SDK sample MultithreadedRendering11, so it may be just an artifact of convenience to have two shadow buffers instead of just one.

Chuck Walbourn
  • 38,259
  • 2
  • 58
  • 81
  • Thanks for the answer. What you've mentioned seems to be related to the swap chain or the final output, not what I asked. Please point out if I'm wrong. I think the swap chain matters because it needs to interact with the OS compositor, which we don't have control, so the result must be queued. For GBuffer, however, reading and writing are submitted in the same queue and I don't see a difference whether it uses double buffering or not, because in each frame, the second reading pass has to wait (stall) for the first writing pass anyway. – petarlobster Jun 18 '21 at 22:03
  • The resource barrier takes care of the transition from read to write, so a fence isn't needed there. That sample is a port/rewrite of a old [Direct3D 11 legacy DirectX SDK sample](https://github.com/walbourn/directx-sdk-samples/tree/master/MultithreadedRendering11), so it may be purely an artifact of convenience rather than strictly required. It's common to have two sets of 'per-frame' dynamic resources to keep things simple with DX12. – Chuck Walbourn Jun 18 '21 at 22:53
  • Thanks for that explanation! I would mark this as the correct answer, but could you please also clarify this in your answer? And regarding fence vs barrier I think I used the wrong term. I did think of a pure GPU version of synchronization mechanism instead of a CPU-GPU one. – petarlobster Jun 18 '21 at 23:48