Kubeflow Training Operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/Apache MXNet/XGBoost/MPI jobs on Kubernetes.
Questions tagged [kubeflow]
433 questions
0
votes
2 answers
How to upgrade an existing kubeflow pipeline?
In the pipeline UI I use the upload pipeline button to upload new pipelines.
Since the pipeline name is unique, the only way to update is to delete the old pipeline and then to upload a new one.
Is there a better way, maybe to manage a version…

user3599803
- 6,435
- 17
- 69
- 130
0
votes
0 answers
How to attach GCP secret to Kubernetes service account?
How can i use a secret object created from Google cloud JSON file into a service account? I have minikf on the VM and kubeflow installed. I am trying to make a container using Jupyter notebook in minikf Kubernetes cluster. The notebook has access to…

Pankaj Kumar
- 3,139
- 1
- 15
- 9
0
votes
1 answer
Pulling images from private google container registry with kubeflow on minikube
We are having trouble giving a container within a pipeline uploaded to Kubeflow access to a private custom docker image stored in a google container registry. We are running kubeflow on top of a kubernetes cluster run on minikube. Can someone help…

Federico K
- 31
- 4
0
votes
1 answer
Does Kubeflow helps to run ML in a distributed manner
I am going through Kubeflow documentation for couple of days and can anyone help me to answer the below questions.
Does Kubeflow helps to run any ML algorithm in a distributed manner?
What's the difference between Kubeflow and Spark ML?

pratik rudra
- 137
- 1
- 3
- 10
0
votes
1 answer
GCP kubernetes nodes with GPU gets preempted too soon
I've got a kubeflow k8s cluster with custom GPU-powered preemptible node pool at us-central1-a:
I run a kubeflow notebook server on these GPU nodes.
By some mysterious reason nodes get compute.instances.preempted message very soon after start (5-10…

orkenstein
- 2,810
- 3
- 24
- 45
0
votes
1 answer
Integrating MLFlow with Kubeflow
I am trying to integrate a MLFlow server with my Kubeflow cluster on GCP. To do this I create a an MLFlow deployment and expose it using a Loadbalancer.
The machine learning code is deployed as a pod on the Kubeflow cluster. The MLflow server…

user3401257
- 81
- 6
0
votes
1 answer
Is there a Python module/(function) that could set the number of CPUs for a dsl ContainerOp (Kubeflow pipelines)?
I've built a Jupyter notebook that deploys a Jupyter notebook into the Kubeflow pipeline service as a component of the pipeline. I want to know if there is a way to specify the number of CPU's and memory for the ContainerOp that deploys the…

Ateev
- 1
- 1
0
votes
3 answers
How to access model microservice deployed behind Istio and Dex?
I built a deploy pipeline to serve ML models using Kubeflow (v0.6) and Seldon Core, but now that models are deployed I can't figure out how to pass the auth. layer and consume the services.
My kubernetes instance is on bare-metal and setup is…

Fábio Beranizo
- 31
- 3
0
votes
1 answer
Failed to marshal the object to TFJob; the spec is invalid: failed to marshal the object to TFJob
i am rather new to both kubernetes and tensorflow, trying to run basic kubeflow distributed-tensorflow example from this link (https://github.com/learnk8s/distributed-tensorflow-on-k8s). I am currently running local bare-metal kubernetes cluster…

Ali Tariq
- 63
- 1
- 8
0
votes
1 answer
How to include hyperparameter tuning in a TFX pipeline?
TFX pipeline is a really good tool for quick end-to-end model development. However, I'd also like to include hyperparameter tuning before final model training and evaluation.
My question is whether there exists a best practice to include tuning in…

Szilárd Kálosi
- 1
- 2
0
votes
1 answer
How can I retrieve the result of a trained model with Kubeflow Fairing?
I am using Kubeflow fairing to train a TensorFlow model on Kubernetes. The training succeeds but now I want to serve a prediction endpoint.
How can I retrieve the saved TensorFlow session (weights, biases etc.) from the training step so that I can…

dippynark
- 2,743
- 20
- 58
0
votes
1 answer
How do I specify/increase CPU usage for a TFjob served on KubeFlow?
I have a GKE setup running KubeFlow on the latest versions with Kustomize. The master TFJob pulls a Docker image of the full model and runs it. I'm running into a simple issue where I wish to increase the amount of CPU usage but can't seem to do…
0
votes
2 answers
Kubeflow Pipeline - Storing (passing) TF.Dataset
I am playing around with Kubeflow Pipelines, what I want to achieve is have one step (python function) where I create an Iterator (generator), from which I want to create a TF.Dataset
Connections between Kubeflow steps are only allowed to have…

Josef Korbel
- 1,168
- 1
- 9
- 32
0
votes
1 answer
Distributed Tensorflow in Kubeflow - NotFoundError
I follow the tutorial for building kubeflow on GCP.
At the last step, after deploying the code and training with CPU.
kustomize build . |kubectl apply -f -
The distributed tensorflow encounter this…

Jim
- 1,550
- 3
- 20
- 34
0
votes
1 answer
How to set local model repository - Tensorrt inference server with Minio
Hi I want to setup Kubeflow - NVIDIA TensorRT inference server with repository located in MinIO.
I don't how to change gs://inference-server-model-store/tf_model_store to connect Minio.
ks init my-inference-server
cd my-inference-server
ks registry…