I've written multithreaded command list recording program in Direct 3D 12 but time saved by multithreaded recording is lost when waiting for fence value completion and presenting. In other words, regardless of singlethreaded or multithreaded recording, total time taken for a single render loop was same due to fence value completion and presenting.
After measuring time consumed by each line or function of my code, multithreaded recording was fine (single threaded recording having 0.0015 seconds while multithreaded recording having 0.0003 seconds each thread (8 threads total) therefore successfully shortening overall recording time).
I found that fence value not completed and waiting for 0.008 seconds was the major cause. So I changed the number of swap chain buffer and fences from 2 to 8 to prepare more next frames as what I need to know is "not when the previous frame is completely finished".
Then, waiting time for fence value completion is reduced to 0.0000003 seconds. However, I found IDXGISwapChain::Present is now consuming 0.009 seconds each call. I thought IDXGISwapChain::Present is asynchronous.
Why is this happening and how can it be solved?
I've tried DXGI_PRESENT_DO_NOT_WAIT for present flag, and then waiting time comes back to 0.008 seconds.
For more information, here's my swap chain desc (when I had 2 back buffers)
DXGI_SWAP_CHAIN_DESC1 SwapChainDesc1
{
800,
600,
DXGI_FORMAT_R8G8B8A8_UNORM,
0,
{ 1, 0 },
DXGI_USAGE_RENDER_TARGET_OUTPUT,
2,
DXGI_SCALING_NONE,
DXGI_SWAP_EFFECT_FLIP_DISCARD,
DXGI_ALPHA_MODE_UNSPECIFIED,
0
};
Also, each command list is calling DrawInstanced 2000 times. This is just to give load for recording in order to compare recording time difference between singlethreaded and multithreaded. I didn't think 2000 times is huge because in reality it has to be called thousands or tens of thousands times.
I also looked D3D12Multithreading and it also calls Draw hundreds of times even though it is indexed instance.