I want to locally make changes to my python scripts, then push it to git, then run the Kubeflow pipeline on Google Cloud. How can I pull the latest commit from git before running the files on the cloud?
-
1As I can see this product has their own support channel https://kubeflow.slack.com/messages/kubeflow-pipelines/ You could also create a consult in this other forum and you may have a better answer. – Jose Luis Delgadillo Dec 15 '20 at 00:07
-
1BTW, the best place to post issues is on GitHub https://gitgub.com/kubeflow/pipelines/issues – Ark-kun Jan 04 '21 at 00:05
1 Answers
There are many ways to accomplish this. Think about how you'd do that without Kubeflow pipelines. Imagine you have a shell script that calls python script and you can run that in cloud. How'd you perform the synchronization?
Some suggestions:
To ensure reproducibility, it's best that the components themselves are immutable. There are multiple ways to achieve that. For example, you can push a new component.yaml
with each script revision.
Then there is an issue with pipeline reproducibility and component versioning. Ideally the pipeline should strictly link to particular component versions (by hash digest or commit hash). In this case switching the pipeline to the new component version means updating the pipeline to point it to new component versions. Alternatively (more convenient, but less reproducible), you can point to component versions using branches (mutable). This way the python pipeline will pull new versions every time it's being compiled. (Same with the Graph components that can represent a pipeline). Note however that the compiled pipeline will be static with components inlined into it, so updating a pipeline will require recompilation.
So:
- Push updated component code and updated
component.yaml
files - Switch pipeline code to new versions (or use branche-based references)
- Submit pipeline for execution.

- 6,358
- 2
- 34
- 70