3

Assume a vertex buffer in device memory and a staging buffer that's host coherent and visible. Also assume a desktop system with a discrete GPU (so separate memories). And lastly, assume correct inter-frame synchronization.

I see two general possible ways of updating a vertex buffer:

  1. Map + memcpy + unmap into the staging buffer, followed by a transient (single command) command buffer that contains a vkCmdCopyBuffer, submit it to the graphics queue and wait for the queue to idle, then free the transient command buffer. After that submit the regular frame draw queue to the graphics queue as usual. This is the code used on https://vulkan-tutorial.com (for example, this .cpp file).

  2. Similar to above, only instead use additional semaphores to signal after the staging buffer copy submit, and wait in the regular frame draw submit, thus skipping the "wait-for-idle" command.

#2 sort of makes sense to me, and I've repeatedly read not to do any "wait-for-idle" operations in Vulkan because it synchronizes the CPU with the GPU, but I've never seen it used in any tutorial or example online. What do the pros usually do if the vertex buffer has to be updated relatively often?

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • 1
    "*Map + memcpy + unmap into the staging buffer*" No, only ever unmap coherent memory when you're about to delete it. There is zero point in mapping memory more than once. – Nicol Bolas Jun 03 '20 at 21:23
  • Hm, I actually did that (keep it mapped) for buffers that I update literally every frame, but it makes sense to keep all staging buffers mapped as well. – Blindy Jun 03 '20 at 23:16

1 Answers1

5

First, if you allocated coherent memory, then you almost certainly did so in order to access it from the CPU. Which requires mapping it. Vulkan is not OpenGL; there is no requirement that memory be unmapped before it can be used (and OpenGL doesn't even have that requirement anymore).

Unmapping memory should only ever be done when you are about to delete the memory allocation itself.

Second, if you think of an idea that involves having the CPU wait for a queue or device to idle before proceeding, then you have come up with a bad idea and should use a different one. The only time you should wait for a device to idle is when you want to destroy the device.

Tutorial code should not be trusted to give best practices. It is often intended to be simple, to make it easy to understand a concept. Simple Vulkan code often gets in the way of performance (and if you don't care about performance, you shouldn't be using Vulkan).

In any case, there is no "most generally correct way" to do most things in Vulkan. There are lots of definitely incorrect ways, but no "generally do this" advice. Vulkan is a low-level, explicit API, and the result of that is that you need to apply Vulkan's tools to your specific circumstances. And maybe profile on different hardware.

For example, if you're generating completely new vertex data every frame, it may be better to see if the implementation can read vertex data directly from coherent memory, so that there's no need for a staging buffer at all. Yes, the reads may be slower, but the overall process may be faster than a transfer followed by a read.

Then again, it may not. It may be faster on some hardware, and slower on others. And some hardware may not allow you to use coherent memory for any buffer that has the vertex input usage at all. And even if it's allowed, you may be able to do other work during the transfer, and thus the GPU spends minimal time waiting before reading the transferred data. And some hardware has a small pool of device-local memory which you can directly write to from the CPU; this memory is meant for these kinds of streaming applications.

If you are going to do staging however, then your choices are primarily about which queue you submit the transfer operation on (assuming the hardware has multiple queues). And this primarily relates to how much latency you're willing to endure.

For example, if you're streaming data for a large terrain system, then it's probably OK if it takes a frame or two for the vertex data to be usable on the GPU. In that case, you should look for an alternative, transfer-only queue on which to perform the copy from the staging buffer to the primary memory. If you do, then you'll need to make sure that later commands which use the eventual results synchronize with that queue, which will need to be done via a semaphore.

If you're in a low-latency scenario where the data being transferred needs to be used this frame, then it may be better to submit both to the same queue. You could use an event to synchronize them rather than a semaphore. But you should also endeavor to put some kind of unrelated work between the transfer and the rendering operation, so that you can take advantage of some degree of parallelism in operations.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 1
    "Tutorial code should not be trusted to give best practices" -- while I'm with you there, I have to learn these somewhere, right? I have the documentation from LunarG that describes each individual function and various tutorials that help put these in perspective. Anyway thank you, I got it, drop the idle wait call and use synchronization between the buffer update submit and render command submit. I'll look into events, I haven't heard of them yet, the submit call only uses semaphores and fences. – Blindy Jun 03 '20 at 23:24
  • @Blindy: "*use synchronization between the buffer update submit and render command submit*" I said nothing about "submit". You should never submit more than once to the same queue in a single frame. `vkQueueSubmit` is not a fast function, so you should submit as much work as possible on every call. – Nicol Bolas Jun 03 '20 at 23:28
  • You mentioned semaphores to synchronize gpu operations (the copy buffer operation and the draw calls), how else do you synchronize them if you don’t submit? Only submit has the wait for semaphore and signal semaphore fields. – Blindy Jun 04 '20 at 06:54
  • @Blindy: "*Only submit has the wait for semaphore and signal semaphore fields.*" Submit itself does not; batches *within* submit can signal/wait on semaphores. You can submit multiple batches. And the point I was trying to get across is that, if you are sending the transfer and rendering commands to the same queue, you should not "submit" them at different times. Submit them all at once, possibly in different batches. – Nicol Bolas Jun 04 '20 at 13:33
  • If I understand correctly, you're talking about `vkQueueSubmit` vs `vkQueuePresentKHR`, correct? I don't believe I've seen a distinction of submit vs batch submit in the documentation. – Blindy Jun 04 '20 at 14:07
  • @Blindy: The distinction is made in [literally the second paragraph](https://www.khronos.org/registry/vulkan/specs/1.2/html/chap4.html#devsandqueues-submission) of the Vulkan specification section on queue submission operations. – Nicol Bolas Jun 04 '20 at 14:40
  • FWIW, "never submit to a queue more than once a frame" is a bit strong. Yes, it's expensive, but it's not *that* expensive. Depending on what you're doing, you could add more inefficiency tying yourself in knots to do a single submit than you gain. Do try to minimize submits, but if you end up with a few submits to a queue per frame it's likely not the end of the world. As always with performance advice, though, measuring on your app and the hardware you care about is what matters. – Jesse Hall Jun 06 '20 at 19:00