I came to this question investigating whether pipeline derivatives provide a benefit. Here's some resources I found from vendors:
Tips and Tricks: Vulkan Dos and Don’ts, Nvidia, June 6, 2019
Don’t expect speedup from Pipeline Derivatives.
Vulkan Usage Recommendations, Samsung
Pipeline derivatives let applications express "child" pipelines as incremental state changes from a similar "parent"; on some architectures, this can reduce the cost of switching between similar states. Many mobile GPUs gain performance primarily through pipeline caches, so pipeline derivatives often provide no benefit to portable mobile applications.
Recommendations
- Create pipelines early in application execution. Avoid pipeline creation at draw time.
- Use a single pipeline cache for all pipeline creation.
- Write the pipeline cache to a file between application runs.
- Avoid pipeline derivatives.
Vulkan Best Practice for Mobile Developers - Pipeline Management, Arm Software, Jul 11, 2019
Don't
- Create pipelines at draw time without a pipeline cache (introduces performance stutters).
- Use pipeline derivatives as they are not supported.
Vulkan Samples, LunarG, API-Samples/pipeline_derivative/pipeline_derivative.cpp
/*
VULKAN_SAMPLE_SHORT_DESCRIPTION
This sample creates pipeline derivative and draws with it.
Pipeline derivatives should allow for faster creation of pipelines.
In this sample, we'll create the default pipeline, but then modify
it slightly and create a derivative. The derivatve will be used to
render a simple cube.
We may later find that the pipeline is too simple to show any speedup,
or that replacing the fragment shader is too expensive, so this sample
can be updated then.
*/
It doesn't look like any vendor is actually recommending the use of pipeline derivatives, except maybe to speed up pipeline creation.
To me, that seems like a good idea in theory on a theoretical implementation that doesn't amount to much in practice.
Also, if the driver is supposed to benefit from a common parent of multiple pipelines, it should be completely able to automate that ancestor detection. "Common ancestors" could be synthesized based on whichever specific common pipeline states provide the best speed-up. Why specify it explicitly through the API?