I am building a Kubeflow pipeline that has 2 components. Component 1 preprocesses some data and component 2 performs model training on that data. I understand I need to save the data at some outputPath
parameter generated by Kubeflow. This works. I am able to get the outputPath
from my first component's .outputs
and pass it into the second as an inputPath
and access those files.
Volumes themselves are also pretty straightforward. I understand I can create a pipeline volume and mount this to each component. This however will use hard-coded paths in the component to refer to where the volume is mounted. (I suppose I could pass this in as a param, but it still doesn't make use of the outputPath
functionality.) I will have quite a bit of data that is processed in component 1 and passed to component 2. This data will eventually be passed to 2 or 3 additional components downstream after testing. So, I am starting to suspect outputPath
is not the best option as this data actually has to be copied from component to component by Kubeflow.
So, my question - how do we get outputPath
and the mounted volume to work together? From the older Kubeflow v0.5 docs they mention that the outputPath
could potentially be the path to that mounted volume but as of the latest Kubeflow v1.3 docs this little bit of text has been removed. Both are included below.
Kubeflow v0.5 Docs -
Output paths are filled in by the pipeline system. The outputPath
placeholder is replaced by a path. (The path can point to a mounted output volume, for example.) The parent directories of the path may or may not not exist. Your program must handle both cases without error.
Kubeflow v1.3 Docs -
The {outputPath: <Output name>}
placeholder is replaced by a (generated) local file path where the component program is supposed to write the output data. The parent directories of the path may or may not not exist. Your program must handle both cases without error.
So, in the latest versions of Kubeflow, can the outputPath still be replaced by a mounted volume? If so, how do I do that? I did not find any examples doing that, specifically.
And more generally, what would be the best approach for me here? Do I even need to use an outputPath
?
Thanks in advance for the support.
Zach