Vulkan: How to record command buffers in separate thread?

Question

I don't properly understand how to parallelize work on separate threads in Vulkan.

In order to begin issuing vkCmd*s, you need to begin a render pass. The call to begin render pass needs a reference to a framebuffer. However, vkAcquireNextImageKHR() is not guaranteed to return image indexes in a round robin way. So, in a triple-buffering setup, if the current image index is 0, I can't just bind framebuffer 1 and start issuing draw calls for the next frame, because the next call to vkAcquireNextImageKHR() might return image index 2.

What is a proper way to record commands without having to specify the framebuffer to use ahead of time?

score 7 · Accepted Answer · answered Feb 05 '18 at 15:59

You have one or more render passes that you want to execute per-frame. And each one has one or more subpasses, into which you want to pour work. So your main rendering thread will generate one or more secondary command buffers for those subpasses, and it will pass that sequence of secondary CBs off to the submission thread.

The submissions thread will create the primary CB that gets rendered. It begins/ends render passes, and into each subpass, it executes the secondary CB(s) created on the rendering thread for that particular subpass.

So each thread is creating its own command buffers. The submission thread is the one that deals with the VkFramebuffer object, since it begins the render passes. It also is the one that acquires the swapchain images and so forth. The render thread is the one making the secondary CBs that do all of the real work.

Yes, you'll still be doing some CB building on the submission thread, but it ought to be pretty minimalistic overall. This also serves to abstract away the details of the render targets from your rendering thread, so that code dealing with the swapchain can be localized to the submission thread. This gives you more flexibility.

For example, if you want to triple buffer, and the swapchain doesn't actually allow that, then your submission thread can create its own extra images, then copy from its internal images into the real swapchain. The rendering thread's code does not have to be disturbed at all to allow this.

In your answer, you discussed building command buffers in some thread(s), but submitting them in a separate thread. Does that imply that submitting the command buffers is heavy enough to warrant its own thread, or do you simply mean in whatever thread you happen to make the submission? — Dess, Feb 06 '18 at 02:16
@Dess: `vkQueueSubmit` is a heavy-weight call. You probably don't need a thread dedicated solely to that operation, but nor should you assume that it's fairly trivial. That's why it is important to submit as much stuff as is reasonable at one time. Also, since the swapchain operations are *also* queue operations (acquire and present), it's best to localize them to a specific thread. — Nicol Bolas, Feb 06 '18 at 02:59

score 5 · Answer 2 · answered Feb 05 '18 at 05:59

You can use multiple threads to generate draw commands for the same renderpass using secondary command buffers. And you can generate work for different renderpasses in the same frame in parallel -- only the very last pass (usually a postprocess pass) depends on the specific swapchain image, all your shadow passes, gbuffer/shading/lighting passes, and all but the last postprocess pass don't. It's not required, but it's often a good idea to not even call vkAcquireNextImageKHR until you're ready to start generating the final renderpass, after you've already generated many of the prior passes.

Ekzuzy · Answer 3 · 2018-02-05T07:47:59.487

First, to be clear:

In order to begin issuing vkCmd*s, you need to begin a render pass.

That is not necessarily true. In command buffers You can record multiple different commands, all of which begin with vkCmd. Only some of these commands need to recorded inside a render pass - the ones that are connected with drawing. There are some commands, which cannot be called inside a render pass (like for example dispatching compute shaders). But this is just a side note to sort things out.

Next thing - mentioned triple buffering. In Vulkan the way images are displayed depends on the supported present mode. Different hardware vendors, or even different driver versions, may offer different present modes, so on one hardware You may get present mode that is most similar to triple buffering (MAILBOX), but on other You may not get it. And present mode impacts the way presentation engine allows You to acquire images from a swapchain, and then displays them on screen. But as You noted, You cannot depend on the order of returned images, so You shouldn't design Your application to behave as if You always have the same behavior on all platforms.

But to answer Your question - the easiest, naive, way is to call vkAcquireNextImageKHR() at the beginning of a frame, record command buffers that use an image returned by it, submit those command buffers and present the image. You can create framebuffers on demand, just before You need to use it inside a command buffer: You create a framebuffer that uses appropriate image (the one associated with index returned by the vkAcquireNextImageKHR() function) and after command buffers are submitted and when they stop using it, You destroy it. Such behavior is presented in the Vulkan Cookbook: here and here.

More appropriate way would be to prepare framebuffers for all available swapchain images and take appropriate framebuffer during a frame. But You need to remember to recreate them when You recreate swapchain.

More advanced scenarios would postpone swapchain acquiring until it is really needed. vkAcquireNextImageKHR() function call may block Your application (wait until image is available) so it should be called as late as possible when You prepare a frame. That's why You should record command buffers that don't need to reference swapchain images first (for example those that render geometry into a G-buffer in deferred shading algorithms). After that when You want to display image on screen (like for example some postprocessing technique) You just take the approach describe above: acquire an image, prepare appropriate command buffer(s) and present the image.

You can also pre-record command buffers that reference particular swapchain images. If You know that the source of Your images will always be the same (like the mentioned G-buffer), You can have a set of command buffers that always perform some postprocess/copy-like operations from this data to all swapchain images - one command buffer per swapchain image. Then, during the frame, if all of Your data is set, You acquire an image, check which pre-recorded command buffer is appropriate and submit the one associated with acquired image.

There are multiple ways to achieve what You want, all of them depend on many factors - performance, platform, specific goal You want to achieve, type of operations You perform in Your application, synchronization mechanisms You implemented and many other things. You need to figure out what best suits You. But in the end - You need to reference a swapchain image in command buffers if You want to display image on screen. I'd suggest starting with the easiest option first and then, when You get used to it, You can improve Your implementation for higher performance, flexibility, easier code maintenance etc.

This is a very good answer. Thank you for the details. Btw, I got your book, and I've been going through it. It's excellent. I actually misplaced my Kindle some time last week, so I was running around on the weekend going "Where's my cookbook!", which is why I didn't have the recipe for this particular question. I have you consider writing another book, for intermediate to advance level. — Dess, Feb 06 '18 at 02:25
@Dess Thanks!! I hope both my answer and my book will help You ;-). As for another book - well, I must say that writing the book was very exhausting. Fun and inspiring, but still exhausting. But after almost a year I'm starting thinking about it from time to time. So maybe I will ;-). But for now I want to write another, more advanced parts of the "API without Secrets: Introduction to Vulkan" tutorial. — Ekzuzy, Feb 06 '18 at 07:57

score 2 · Answer 4 · answered Feb 05 '18 at 09:45

You can call vkAcquireNextImageKHR in any thread. As long as you make sure the access to the swapchain, semaphore and fence you pass to it is synchronized.

There is nothing else restricting you from calling it in any thread, including the recording thread.

You are also allowed to have multiple images acquired at a time. Assuming you have created enough. In other words acquiring the next image before you present the current one is allowed.

Vulkan: How to record command buffers in separate thread?

4 Answers4