0

I'm currently writing my own graphics framework for DirectX12 (I've already written several DirectX 11 frameworks for personal game engines), and I'm currently trying to copy the methods used in the recent Hitman game for resource binding.

I'm confused about the best way to handle per-object resource binding for the SRV/CBV/UAV heap. I've watched several GDC presentations, and they all seem to gloss over this.

Only 1 SRV/CBV/UAV heap can be bound at a time, and switching the currently-bound heap in the middle of a command list can be bad for performance on some hardware by forcing a flush. Because of this, what is the best way to handle updating the heap with new descriptors? To me, it seems like each command list would:

  1. Get a hold of a SRV/CBV/UAV heap for itself.
  2. For each object in a subset of objects, create descriptors on the heap pointing to per-object data that was placed into a separate upload heap.
  3. Afterwards, another command list takes this filled descriptor heap and binds it, then issues draw calls mixed with SetGraphicsRootDescriptorTable in order to move through the current descriptor heap.

This being said, several sources online (including another SO post) suggest using one large SRV/CBV/UAV heap and copying into it using CPU-visible heaps. I'm assuming they're not attempting to use the asynchronous CopyDescriptors, but rather CopyBufferRegion. I tried using CopyBufferRegion to update data per-object, but to me this seems under-performant with so many transitions between D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER and D3D12_RESOURCE_STATE_COPY_DEST. Am I misunderstanding something? Any clarity would be appreciated.

Community
  • 1
  • 1
cehnehdeh
  • 527
  • 3
  • 13
  • 1
    Be sure to take a look at both [MiniEngine](https://github.com/Microsoft/DirectX-Graphics-Samples/tree/master/MiniEngine) and [DirectX Tool Kit for DirectX 12](https://github.com/Microsoft/DirectXTK12) for some example code for approaches to dealing with this problem. – Chuck Walbourn Jan 15 '17 at 07:48

1 Answers1

2

CopyDescriptors is not asynchronous, it is a CPU operation that is immediate on the CPU. It can happen anytime before a command list is executed for volatile descriptor ( after the command list operation using it is recorded ), or have to be ready at the usage for static descriptor ( root signature 1.1 ).

The usual approach is to have a large descriptor heap, keep a portion for static descriptors, then use the rest as a ring buffer, allocating descriptor table offset on demand to copy and use the needed descriptor for any draw/compute operation.

CopyBufferRegion has nothing to do here, remember that mapping buffers is also an immediate operation, so you also ring buffer a big chunk of memory for your per objet constant buffers, and you cycle into it. The only thing is that you need to make sure you do not overwrite memory or descriptor while they may still be in use, so you have to fence to prevent the case.

galop1n
  • 8,573
  • 22
  • 36
  • So if I use `CopyDescriptors`, when does it update the heap on the GPU-side? When I call `SetGraphicsRootDescriptorTable`? – cehnehdeh Jan 18 '17 at 19:03
  • No, CopyDescriptors is immediate, Once it returns, the copy is done. The only tricky thing is with root signature 1.0 versus 1.1. 1.0 and volatile table do not assume the descriptor are ready when you use them on the CPU, they assume they will be here when you close the command list. 1.1 with static table assume the descriptor are ready when when you set them to trigger a draw and that they will not change after that. – galop1n Jan 18 '17 at 19:11
  • So if I'm rendering multiple objects, each with unique cbuffer data, I'd call `CopyDescriptors` to update the descriptor heap at a different offset for each object? – cehnehdeh Jan 18 '17 at 19:15
  • Yes, exactly, you walk the heap as a big ring buffer, allocating table on demands for your render operations, as the descriptors have to survive until the gpu will be done with them. – galop1n Jan 18 '17 at 19:17
  • That CPU stall on `CopyDescriptors` seems like it would be insanely expensive, opening/closing the pipe to the GPU once per object. Is there something I'm missing? – cehnehdeh Jan 18 '17 at 19:18
  • There is no stall, because there is no sync here, it is not a gpu copy, it is merely a cpu memcpy from a cpu adress range to a gpu adress range. You cannot do faster than that. – galop1n Jan 18 '17 at 19:20
  • So then after I do this CPU-copy, when will the GPU-side heap memory be updated? Sorry if this is a dumb question! – cehnehdeh Jan 18 '17 at 19:22
  • What do you not understand in the word immediate, after the call to CopyDescriptors, it is ready, it is IMMEDIATE ! – galop1n Jan 18 '17 at 19:24
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/133482/discussion-between-cehnehdeh-and-galop1n). – cehnehdeh Jan 18 '17 at 19:24