3

I'm experimenting a bit with the new features in DirectX12. So far I really like some of the changes, for example, the pipeline states. At the same time, some other changes are a bit confusing, for example, the descriptor heaps.

Let's start with a quick background so you better understand what I'm asking for.

In DirectX11, we created objects of different shaders and then we had to bind each of them separately during actual runtime when setting up our draw call. Here's a pseudo-example:

deviceContext->VSSetShader(...);
deviceContext->HSSetShader(...);
deviceContext->DSSetShader(...);
deviceContext->PSSetShader(...);

In DirectX12, they've implemented this so much smarter, because now we can configure the pipeline state during initialization instead, and then set all of the above with a single API call:

commandList->SetPipelineState(...);

Very simple, elegant and quicker. And on top of that, very logical. Now let's take a look at the descriptor heaps instead. I kind of expected this to follow the same elegant pattern, and this is basically what my question is about.

In DirectX11, we created objects of different desriptors (views) and then we had to bind each of them separately for each shader during actual runtime when setting up our draw call. Once again a pseudo-example:

deviceContext->PSSetConstantBuffers(0, n, ...);
deviceContext->PSSetShaderResources(0, n, ...);
deviceContext->PSSetSamplers(0, n, ...);

In DirectX12, they've implemented something called descriptor heaps. Basically they're chunks of memory that contain all of the descriptors that we want to bind, and we can also set it up during initialization. So far, it looks equally elegant as the pipeline state, since we can set everything with a single API call:

commandList->SetDescriptorHeaps(n, ...);

Or can we? This is where the confusion arises, because after a search I found this question that states:

Swapping descriptor heaps is a costly operation you want to avoid at all cost.

Meanwhile, the MSDN documentation for SetDesciptorHeaps doesn't state anything about this method behing particularly expensive.

Considering how elegantly they've designed the pipeline state, I was kind of expecting to be able to do this:

commandList->SetPipelineState(...);
commandList->SetDescriptorHeaps(n, ...);
commandList->DrawInstanced(...);

commandList->SetPipelineState(...);
commandList->SetDescriptorHeaps(n, ...);
commandList->DrawInstanced(...);

commandList->SetPipelineState(...);
commandList->SetDescriptorHeaps(n, ...);
commandList->DrawInstanced(...);

But if SetDescriptorHeaps is actually that expensive, this will propably provide a very bad performance. Or will it? As said, I can't find any statement about this actually being a bad idea on MSDN.

So my questions are:

  • If the above is considered bad practice, how should SetDescriptorHeaps be used?
  • If this is a Nvidia-only performance problem, how come that they don't fix their drivers?

Basically, what I want to do is to have two descriptor heaps (CBV/SRV/UAV + sampler) for each pipeline state. And judging from how cheap it's to change the pipeline state, it would be logical that changing the descriptor heap would be equally cheap. The pipeline state and the descriptor heap are quite closely related, i.e. changing the pipeline state will most likely require a different set of descriptors.

I'm aware of the strategy of using one massive descriptor heap for each type of descriptor. But that approach feels so overly complicated considering all the work required to keep track of each individual descriptors index. And on top of that, the descriptors in a descriptor table need to be continious in the heap.

  • ``SetPipelineState`` and ``SetRootSignature`` are quite cheap operations, but ``SetDescriptorHeaps`` depends on the specific hardware as to how expensive it will be. As with everyone performance, try it and measure it across a range of vendor hardware is really the only way to know. – Chuck Walbourn Mar 31 '19 at 17:44
  • Note that in [DirectX Tool Kit for DX12](https://github.com/Microsoft/DirectXTK12) I tend to use single-entry descriptor tables to avoid the need to ensure all the resources I use are continuous. See the guidance from [Intel](https://software.intel.com/en-us/articles/performance-considerations-for-resource-binding-in-microsoft-directx-12) and [nVidia](https://developer.nvidia.com/dx12-dos-and-donts), and this [presentation](https://developer.nvidia.com/sites/default/files/akamai/gameworks/blog/GDC16/GDC16_gthomas_adunn_Practical_DX12.pdf) for general advice. – Chuck Walbourn Mar 31 '19 at 17:46

2 Answers2

1

Descriptor heaps are independent of pipelines; they don't have to be bound per draw/dispatch. You can also just have a big descriptor heap and bind that instead. This should then be corrected by the root signature though; which should point to the correct offset in this descriptor heap. This means you could have unique textures in one heap and point your root signature to the correct descriptor. You could also suballocate the current heap into one giant heap.

Niels
  • 36
  • 3
  • 1
    Thank you for your reply. I feel that I'm aware of the _different ways_ of using it, my questions was aimed more towards which is _the correct_ way of using it. For example, I recall reading that `SetDescriptorHeaps(...)` were slow on Nvidia cards, is that still the case? If no, can I change descriptor heaps each time I change the pipeline state, or, if yes, should I use a ring buffer and copy all descriptors into it (sounds like a lot of ms spent on management and overhead)? – fighting_falcon93 Nov 28 '19 at 11:14
  • It depends on your use case honestly (in small cases the overhead of descriptor management is not worth it), however in big projects it is generally a good idea to only have 1 or multiple big descriptor heaps (1 allocation has a limit of 1 M descriptors IIRC). Then you can use SetxxxRootDescriptorTable with an offset into one of those. The allocation of the GPU descriptors can be done using a freelist approach. I store the unique descriptor sets that exist and delete them if they aren't used for N frames (so 1 CPU->GPU update on create). However a ringbuffer could also be used. – Niels Nov 29 '19 at 13:07
1

MSDN documentation has now addressed the performance hit on switching heaps:

On some hardware, this can be an expensive operation, requiring a GPU stall to flush all work that depends on the currently bound descriptor heap.

Source: Descriptor Heaps Overview - Switching Heaps

The reason this may happen is that for some hardware, switching between hardware descriptor heaps during execution requires a GPU wait for idle (to ensure that GPU references to the previously descriptor heap are finished).

To avoid being impacted by this possible wait for idle on the descriptor heap switch, applications can take advantage of breaks in rendering that would cause the GPU to idle for other reasons as the time to do descriptor heap switches, since a wait for idle is happening anyway.

Source: Shader Visible Descriptor Heaps - Overview

Xeon-J
  • 162
  • 1
  • 7