3

I am having a hard time trying to swallow a concept of multithreaded render in DX12.

According to MSDN one must write draw commands into direct command lists (preferably using bundles) and then submit those lists to a command queue. It is also said that one can have more than one command queue for direct command lists. But it is unclear for me what is the purpose of doing so.

I take the full profit of multithreading by building command lists in parallel threads, don't i? If so, why would i want to have more than one command queue associated with the device?

I suspect that improper management of command queues can lead to enormous troubles with performance in later stages of rendering library development.

Diligent Key Presser
  • 4,183
  • 4
  • 26
  • 34
  • 2
    [msdn](https://msdn.microsoft.com/en-us/library/windows/desktop/dn899124%28v=vs.85%29.aspx) mentions that queues can be executed in parallel. So in theory there's a benefit of using multiple queues. But as far as I know no gpu here that supports parallel execution (thought AMD can execute graphics and compute queues in parallel). – nikitablack Aug 04 '16 at 07:18

1 Answers1

2

The main benefit to directx 12 is that execution of commands is almost purely asynchronous. Meaning when you call ID3D12CommandQueue::ExecuteCommandLists it will kick off work of the commands passed in. This brings another point however. A common misconception is that rendering is somehow multithreaded now, and this is just simply not true. All work is still executed on the GPU. However command list recording is what is done on several threads, as you will create a ID3D12GraphicsCommandList object for each thread needing it.

An example:

DrawObject DrawObjects[10];
ID3D12CommandQueue* GCommandQueue = ...

void RenderThread1()
{
     ID3D12GraphicsCommandList* clForThread1 = ...
     for (int i = 0; i < 5; i++)
         clForThread1->RecordDraw(DrawObjects[i]);
}

void RenderThread2()
{
     ID3D12GraphicsCommandList* clForThread2 = ...
     for (int i = 5; i < 10; i++)
         clForThread2->RecordDraw(DrawObjects[i]);
}

void ExecuteCommands()
{
     ID3D12GraphicsCommandList* cl[2] = { clForThread1, clForThread2 };
     GCommandQueue->ExecuteCommandLists(2, cl);
     GCommandQueue->Signal(...)
}

This example is a very rough use case, but that is the general idea. That you can record objects of your scene on different threads to remove the CPU overhead of recording the commands.

Another useful thing however is that with this setup, you can kick off rendering tasks and start recording another.

An example

void Render()
{
     ID3D12GraphicsCommandList* cl = ...
     cl->DrawObjectsInTheScene(...);
     CommandQueue->Execute(cl); // Just send it to the gpu to start rendering all the objects in the scene

     // And since we have started the gpu work on rendering the scene, we can move to render our post processing while the scene is being rendered on the gpu
     ID3D12GraphicsCommandList* cl2 = ...
     cl2->SetBloomPipelineState(...);
     cl2->SetResources(...);
     cl2->DrawOnScreenQuad();
}

The advantage here over directx 11 or opengl is that those apis potentially just sit there and record and record, and possibly don't send their commands until Present() is called, which forces the cpu to wait, and incurring an overhead.

Alex Kiecker
  • 51
  • 1
  • 5
  • Thank you! But the question still is: what is the purpose of having more than just one command queue? – Diligent Key Presser May 02 '18 at 13:59
  • 1
    You want to have command queue specialized for each task. So a command queue made with D3D12_COMMAND_QUEUE_TYPE_COPY will handle copy operations, UpdateSubresources(..), copying from upload heap, etc. A direct command queue is generally just for your main rendering tasks, and a compute queue is for your compute tasks. While you can have multiple command queues of the same type, it will just make synchronization and resource management a pain – Alex Kiecker May 02 '18 at 21:00
  • So it is just here to simplify concurrent job submission (by some kind of internal synchronization), and having multiple command queues vs one command queue for all kinds of job does not make any difference in other aspects, such as resource management and execution speed? – Diligent Key Presser May 03 '18 at 08:19
  • Well the main aspect to them is that with a compute queue and a direct queue you could potentially do tasks on the gpu asynchronously. But how those work is that they essentially expose the different parts of the hardware, the compute queue exposes the cuda cores, the direct queue exposes the rasterization aspect, the copy queue exposes the copy part of the gpu, etc. – Alex Kiecker May 04 '18 at 05:05