Vulkan: Creating and benefit of pipeline derivatives

Question

In Vulkan, you can use vkCreateGraphicsPipeline or vkCreateComputePipeline to create pipeline derivates, with the basePipelineHandle or basePipelineIndex members of VkGraphicsPipelineCreateInfo/VkComputePipelineCreateInfo. The documentation states that this feature is available for performance reasons:

The goal of derivative pipelines is that they be cheaper to create using the parent as a starting point, and that it be more efficient (on either host or device) to switch/bind between children of the same parent.

This raises quite a few questions for me:

Is there a way to indicate which state is shared between parent and child pipelines, or does the implementation decide?
Is there any way to know whether the implementation is actually getting any benefit from using derived pipelines (other than profiling)?
The parent pipeline needs to be created with VK_PIPELINE_CREATE_ALLOW_DERIVATIVES_BIT. Is there a downside to always using this flag (eg. in case you may create a derived pipeline from this one in the future)?

score 7 · Answer 1 · edited Apr 19 '21 at 00:57

I came to this question investigating whether pipeline derivatives provide a benefit. Here's some resources I found from vendors:

Tips and Tricks: Vulkan Dos and Don’ts, Nvidia, June 6, 2019

Don’t expect speedup from Pipeline Derivatives.

Vulkan Usage Recommendations, Samsung

Pipeline derivatives let applications express "child" pipelines as incremental state changes from a similar "parent"; on some architectures, this can reduce the cost of switching between similar states. Many mobile GPUs gain performance primarily through pipeline caches, so pipeline derivatives often provide no benefit to portable mobile applications.

Recommendations

Create pipelines early in application execution. Avoid pipeline creation at draw time.

Use a single pipeline cache for all pipeline creation.

Write the pipeline cache to a file between application runs.

Avoid pipeline derivatives.

Vulkan Best Practice for Mobile Developers - Pipeline Management, Arm Software, Jul 11, 2019

Don't

Create pipelines at draw time without a pipeline cache (introduces performance stutters).

Use pipeline derivatives as they are not supported.

Vulkan Samples, LunarG, API-Samples/pipeline_derivative/pipeline_derivative.cpp

/*
VULKAN_SAMPLE_SHORT_DESCRIPTION
This sample creates pipeline derivative and draws with it.
Pipeline derivatives should allow for faster creation of pipelines.
In this sample, we'll create the default pipeline, but then modify
it slightly and create a derivative.  The derivatve will be used to
render a simple cube.
We may later find that the pipeline is too simple to show any speedup,
or that replacing the fragment shader is too expensive, so this sample
can be updated then.
*/

It doesn't look like any vendor is actually recommending the use of pipeline derivatives, except maybe to speed up pipeline creation.

To me, that seems like a good idea in theory on a theoretical implementation that doesn't amount to much in practice.

Also, if the driver is supposed to benefit from a common parent of multiple pipelines, it should be completely able to automate that ancestor detection. "Common ancestors" could be synthesized based on whichever specific common pipeline states provide the best speed-up. Why specify it explicitly through the API?

> Why specify it explicitly through the API? I think that's because Vulkan is very thin API and it gives the control of determining common anscestors to the application. — Ravi Prakash, Jun 26 '23 at 13:08

score 5 · Accepted Answer · answered May 10 '16 at 14:39

5

Is there a way to indicate which state is shared between parent and child pipelines

No; the pipeline creation API provides no way to tell it what state will change. The idea being that, since the implementation can see the parent's state, and it can see what you ask of the child's state, it can tell what's different.

Also, if there were such a way, it would only represent a way for you to accidentally misinform the implementation as to what changed. Better to just let the implementation figure out the changes.

Is there any way to know whether the implementation is actually getting any benefit from using derived pipelines (other than profiling)?

No.

The parent pipeline needs to be created with VK_PIPELINE_CREATE_ALLOW_DERIVATIVES_BIT. Is there a downside to always using this flag (eg. in case you may create a derived pipeline from this one in the future)?

Probably. Due to #1, the implementation needs to store at least some form of the parent pipeline's state, so that it can compare it to the child pipeline's state. And it must store this state in an easily readable form, which will probably not be the same form as the GPU memory and tokens to be copied into the command stream. As such, there's a good chance that parent pipelines will allocate additional memory for such data. Though the likelihood of them being slower at binding/command execution time is low.

You can test this easily enough by passing an allocator to the pipeline creation functions. If it allocates the same amount of memory as without the flag, then it probably isn't storing anything.

answered May 10 '16 at 14:39

Nicol Bolas

449,505
63
781
982

sounds like it would be a good idea for switching redertargets (a set of pipelines one for each image in the swapchain) – ratchet freak May 10 '16 at 22:49
@ratchetfreak: If you're doing ping-ponging, you wouldn't use a swapchain. You'd place an input attachment as the same as the output attachment, then use pipeline dependencies to switch. And most other render target changing also requires changing the nature of the rendering (rendering depth vs. g-buffers vs. lighting passes, etc). So it's rare that you'd change *only* the rendertarget in a pipeline. – Nicol Bolas May 10 '16 at 23:51
But in normal swapchain rendering you *have* to switch the rendertarget every frame as the next image you acquire will not be the one you just submitted for present. Even if most everything else stays the same (excepting some contents of the memory like the transform matrix uniforms). – ratchet freak May 11 '16 at 00:17
1

@ratchetfreak: But pipelines aren't bound to specific images. They're bound to renderpasses. Which can use any image attachment that fits the specified format. So there's no reason to change pipelines just because you changed swap chain images. – Nicol Bolas May 11 '16 at 00:51
It seems, if your answer is correct (which, I'll assume it is, and accept it), it's actually difficult to feature use. The only stated benefit is performance, and there's no guarantee that it will actually help performance - in fact, there's a small chance that it will be worse. – MuertoExcobito May 11 '16 at 01:43
@MuertoExcobito: No, it's actually a very simple feature to use. What's difficult is knowing whether it's actually accomplished something. That is, is changing (for example) vertex input state with sibling pipelines that differ only in input state faster than using a single pipeline and a single vertex format, or manually doing vertex fetching in the shader and deciding the format based on a push constant? These are questions that can only be answered by profiling. – Nicol Bolas May 11 '16 at 02:27
It ends up being a lot of time spent guessing where the driver vendors have optimized, and trying to target that path. It can be different per vendor (and per driver version), which might mean many implementations of the essentially same functionality. That's a whole lot of guessing and checking, which doesn't sound all that complicated, but very time consuming. – MuertoExcobito May 11 '16 at 03:09
Pure speculation, but I think a reasonable approach might be to assume that the most expensive part of pipeline generation is optimizing the core of your shaders, so if you find yourself making multiple pipelines based on the same shader set but with different render passes/blend modes/depth states/etc then derive from the original. I'd imagine the biggest gains from this feature will come from shader patching type behaviours in the driver. – Columbo May 11 '16 at 19:25
@Columbo - I would expect your speculation is correct - creating derived pipelines with the same shaders, but different ancillary render states is probably where the intention in efficiency lies. I think a more targeted API for creating derived shader pipelines (only varying these states) would have given better hints to the end-user about how to optimize. – MuertoExcobito May 12 '16 at 17:41

Shahbaz · Answer 3 · 2016-05-15T19:24:55.373

0

I'm no expert in computer graphics, but my understanding (partly includes intuition) is the following:

Is there a way to indicate which state is shared between parent and child pipelines, or does the implementation decide?

There are certain aspects of the pipeline that are not specified at render time (and so are fixed), for example which shaders to use. My speculation is that the derived from and the derived pipelines likely share these "read-only" information (or in C terms, they point to the same object). That's why creation of derived pipelines is faster.

Switching between these pipelines would also be faster because there is less need to change resources on changing pipelines, because some of the resources are shared and the same.

The parent pipeline needs to be created with VK_PIPELINE_CREATE_ALLOW_DERIVATIVES_BIT. Is there a downside to always using this flag (eg. in case you may create a derived pipeline from this one in the future)?

This is very likely implementation-dependent. My speculation is that, when you allow derivatives, you enable resource (e.g. shader) sharing, which means the implementation is likely going to do reference counting for these resources. That would be an unnecessary cost if the resources are not going to be shared. Also, when changing pipelines, the driver wouldn't need to check whether each resource is shared and can stay on the GPU, or is not and needs changing. If there is no sharing, all resources would be changed, and there is no overhead of checking. None of these are that much of an overhead, so either Vulkan is staying on the safe side, or there is another reason I don't know about.

edited May 15 '16 at 19:24

answered May 12 '16 at 17:23

Shahbaz

46,337
19
116
182

This answer makes a lot of speculations, at least some of which definitely aren't true. – MuertoExcobito May 12 '16 at 17:37
@MuertoExcobito, I would certainly be happy to know which parts aren't true. I'm learning Vulkan myself too. – Shahbaz May 12 '16 at 19:23
The threaded rendering part is definitely not correct - you can use the same pipeline object between different threads. If you think of the pipeline as "just another resource", it doesn't make sense to create one-per-thread (eg. you wouldn't create a new texture per thread that uses it). – MuertoExcobito May 12 '16 at 19:32
The also the part about "certain aspects of the pipeline are not specified at runtime" seems dubious... the entire pipeline state _is_ specified at runtime. I'm assuming you mean that the shader bytecode used is compiled offline (although, there's actually no guarantee of that). – MuertoExcobito May 12 '16 at 19:36
@MuertoExcobito, I was under the impression that the pipeline is keeping track of the execution state, so you wouldn't be able to use the same object on two different threads in parallel. Isn't that right? – Shahbaz May 12 '16 at 19:52
@MuertoExcobito, _certain aspects of the pipeline are not specified at runtime_, the shader is one thing. Another thing is for example which stages actually exist in the pipeline. You are right, perhaps "runtime" was not the correct term, I meant "render time". So you create the pipeline first, and don't change (most of?) it from then on during all your rendering. – Shahbaz May 12 '16 at 19:55
@Shahbaz: "*I was under the impression that the pipeline is keeping track of the execution state*" The execution state of what? A pipeline simply stores state. That's all it does: it stores information about how to render a certain way. Notably, vkCmdBindPipeline says nothing about synchronizing access to pipelines across threads, only to the command buffer. – Nicol Bolas May 12 '16 at 20:32
@nicolbolas, "a pipeline simply stores state" is kind of in conflict with "stores information about how to render". The first implies that the pipeline contains parts that change during the execution of the command buffer, while the second implies that the pipeline is a read-only object. My impression was in line with your first statement, that pipeline contains state, so for example it keeps track of for example which stage of the pipeline you should execute next. If that is true, then access to the pipeline must be synchronized between threads because they are bith writing the state. – Shahbaz May 13 '16 at 03:39
If the pipeline instead doesn't need synchronization, then it cannot contain any state. In that case, the execution state is likely tracked by the command buffer, which is ok, but I just don't know for sure which is the case. – Shahbaz May 13 '16 at 03:41
@Shahbaz: I don't know what this "execution state" stuff you keep talking about is. But an object storing state has nothing to do with whether that state is *mutable* after the object is created. If you have a `const int`, it stores state. But that state cannot be *changed* after it is stored. It can be *used*, but not modified. The same goes with pipelines: you can bind them to command buffers, which causes their state to influence subsequent rendering commands, but you cannot *change* their state. – Nicol Bolas May 13 '16 at 03:46
Reading your answer again, I understand we are not talking about the same "state". You are talking about the parameters you give when creating the pipeline. I don't know why they are called state, because they sre not actually changed at run time (unless dynamic) nor do they track execution (as in states of a state machine). What I'm talking about is the state machine derived from the pipeline, which is used to track what parts of the pipeline have already executed and what is left to do. So, if the state variables of the state machine are part of the pipeline, then... – Shahbaz May 13 '16 at 03:54
... access to the pipeline needs to be synchronized – Shahbaz May 13 '16 at 03:55
@Shahbaz: "*What I'm talking about is the state machine derived from the pipeline, which is used to track what parts of the pipeline have already executed and what is left to do.*" The Vulkan specification does not define any such thing. Matters of exactly how a pipeline is executed when a command buffer is sent to a queue is implementation-dependent. The Vulkan specification does state that pipeline objects are immutable and access to their contents does not require synchronization (just as access to a `const int` does not require synchronization). – Nicol Bolas May 15 '16 at 00:42
@NicolBolas, great, that clears things up. I do see that the pipeline is not specified as requiring explicit synchronization, but I can't find where it says pipeline objects are immutable. In which section have you seen such a quote? – Shahbaz May 15 '16 at 19:23
@Shahbaz: The fact that there is no Vulkan function that can modify a pipeline's data. The fact that the behavior of a pipeline is defined entirely in terms of the parameters provided to `vkCreateCompute/GraphicsPipeline`. Whether the implementation is immutable is irrelevant; if an implementation puts mutable data in one, then it is up to the implementation to make sure that the mutable data has no visible effects. – Nicol Bolas May 15 '16 at 21:07
@NicolBolas, fair enough – Shahbaz May 16 '16 at 22:27

Vulkan: Creating and benefit of pipeline derivatives

3 Answers3