0

My question is fairly simple and may look naive, but I don't see a lot of talks about it, because mainly articles and posts I saw treats of real-time rendering.

But when I see a GPU can have up to 8 or 16 graphics queues. I was wondering if we can launch as much renders as the GPU have queues ? I mean, I'm deeply interested by this in a fully offscreen rendering software where renders were totally unrelated, excepted by geometry and shaders.

2 Answers2

2

You can shove as much stuff down however many queues the implementation allows you to have.

Will you get any meaningful performance improvement out of doing so? That is highly unlikely.

On a CPU, if you don't use a thread, that thread goes unused. That's just how threads work.

GPUs aren't like that. Queues are not like CPU threads. They are interfaces for dispatching work to the various execution units of the GPU. As such, one of the 8 queues does not limit itself to merely 1/8th of the available hardware. It will fill up however many execution units and hardware components are available to be filled up with the work it has.

Submitting two pieces of work at the same time will therefore cause them both to contend for the same resources. The results will still take the same total amount of time to be completed because they're ultimately using the same resources.

This article (and its companions) are from 2018, but they should still be broadly applicable.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Thanks, I was pessimitivatily thinking that multiple queues in usage will divide the GPU strenght. But I was unsure... I'm just wondering why so much queues are available on some device. – Sébastien Bémelmans Jun 26 '23 at 23:02
  • 1
    @SébastienBémelmans: it's more for prioritization. Maybe you're running a rendering operation, but then you need to do some compute work and get the answer ASAP. So you set the priority for the compute queue to be higher, so that it will steal time from the middle of the rendering task. The work *as a whole* takes the same amount of time, but you get your compute answer sooner. – Nicol Bolas Jun 26 '23 at 23:07
  • The idea behind my needs is to render offscreen a geometry multiple times from different point of view for saving results into images. I was planning to make it in parallele, not using all queues available, but push the GPU to work at 100% by submitting the different renders at once. – Sébastien Bémelmans Jun 26 '23 at 23:12
1

The GPU is free to do whatever it wants, however it wants, as long as it follows the constraints set by the Vulkan spec. The only real constraints in Vulkan queues are synchronization primitives. As long as everything ends up in the right order according to the semaphores, everything in between semaphores can happen in any order. This can happen within a command buffer, within a queue, within a queue family, within a device, or across devices (a device being the virtual context represented as a VkDevice, not a physical device).

Taking from NVidia's explanation of the rendering process on their GPUs, within a single Graphics Processing Cluster there is a single rasterizer, and a lot of cores and dispatch units to handle the shaders, most of their GPU's have multiple GPCs, so each one can presumably be working on rendering a different triangle. In practice things are wildly more complex than what I've described.

So can you render things in parallel: sure, why not; will you notice: assuming you setup your synchronization primitives correctly, probably not.

Pragmatically speaking, this would be something you would ask your support engineer for the various GPU manufacturers you work with, and they would be able to go over how to best optimize your renderer.

vandench
  • 1,973
  • 3
  • 19
  • 28
  • That was my initial point of view. But, can we assume for a GPU that have 16 graphics queues, let say, making two independant renders, using two queues at the same time will be more efficient thant making the same two renders one after the other ? – Sébastien Bémelmans Jun 26 '23 at 22:58
  • @SébastienBémelmans [How a VkQueue is mapped to the underlying hardware is implementation-defined. Some implementations will have multiple hardware queues and submitting work to multiple VkQueue​s will proceed independently and concurrently. Some implementations will do scheduling at a kernel driver level before submitting work to the hardware. There is no current way in Vulkan to expose the exact details how each VkQueue is mapped.](https://github.com/KhronosGroup/Vulkan-Guide/blob/main/chapters/queues.adoc) – user253751 Jun 26 '23 at 23:08
  • in other words: if the hardware can do more than one thing at the same time, then using more than one queue is how you can tell it to do that. If the hardware can't do more than one thing at the same time, you can still use more than one queue, but it won't make it go faster. – user253751 Jun 26 '23 at 23:08
  • mmmh ok, I think I had to test it, to see it so. – Sébastien Bémelmans Jun 26 '23 at 23:15