I'm building an ML pipeline in Kubeflow and I have a question. Is there anything out of the box that allows me to configure my pipeline, such that a step is not rerun if its output exists? I've thought of ways to do this manually (either checking for existing outputs as I'm compiling the pipeline, or having an initial step that returns a list of steps to run, or manually configuring which steps to run as an input parameter) but I cannot find a native way of handling this.
The common use case for me would be to rerun the model step without rerunning any pre-processing of the data; but without having to have a specific "model development" pipeline that would differ from the more general prod one that would include the data pre-processing step. Or perhaps I'm iterating on an evaluation phase and I don't even need retraining but I would still like to use the same pipeline. Right now, colleagues are using several pipelines, that each start at a different step, to work around this.
I'm coming at it from a map-reduce perspective where this is trivial - the framework automatically detects which outputs are present and doesn't rebuild them as default, but easily gives you the option to rebuild some or all of them. Maybe this is biasing my way of working with kubeflow?
Any help appreciated!