Questions tagged [kubeflow]

Kubeflow Training Operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/Apache MXNet/XGBoost/MPI jobs on Kubernetes.

GitHub: https://github.com/kubeflow/training-operator

433 questions
0
votes
2 answers

How to upgrade an existing kubeflow pipeline?

In the pipeline UI I use the upload pipeline button to upload new pipelines. Since the pipeline name is unique, the only way to update is to delete the old pipeline and then to upload a new one. Is there a better way, maybe to manage a version…
user3599803
  • 6,435
  • 17
  • 69
  • 130
0
votes
0 answers

How to attach GCP secret to Kubernetes service account?

How can i use a secret object created from Google cloud JSON file into a service account? I have minikf on the VM and kubeflow installed. I am trying to make a container using Jupyter notebook in minikf Kubernetes cluster. The notebook has access to…
Pankaj Kumar
  • 3,139
  • 1
  • 15
  • 9
0
votes
1 answer

Pulling images from private google container registry with kubeflow on minikube

We are having trouble giving a container within a pipeline uploaded to Kubeflow access to a private custom docker image stored in a google container registry. We are running kubeflow on top of a kubernetes cluster run on minikube. Can someone help…
0
votes
1 answer

Does Kubeflow helps to run ML in a distributed manner

I am going through Kubeflow documentation for couple of days and can anyone help me to answer the below questions. Does Kubeflow helps to run any ML algorithm in a distributed manner? What's the difference between Kubeflow and Spark ML?
pratik rudra
  • 137
  • 1
  • 3
  • 10
0
votes
1 answer

GCP kubernetes nodes with GPU gets preempted too soon

I've got a kubeflow k8s cluster with custom GPU-powered preemptible node pool at us-central1-a: I run a kubeflow notebook server on these GPU nodes. By some mysterious reason nodes get compute.instances.preempted message very soon after start (5-10…
0
votes
1 answer

Integrating MLFlow with Kubeflow

I am trying to integrate a MLFlow server with my Kubeflow cluster on GCP. To do this I create a an MLFlow deployment and expose it using a Loadbalancer. The machine learning code is deployed as a pod on the Kubeflow cluster. The MLflow server…
0
votes
1 answer

Is there a Python module/(function) that could set the number of CPUs for a dsl ContainerOp (Kubeflow pipelines)?

I've built a Jupyter notebook that deploys a Jupyter notebook into the Kubeflow pipeline service as a component of the pipeline. I want to know if there is a way to specify the number of CPU's and memory for the ContainerOp that deploys the…
Ateev
  • 1
  • 1
0
votes
3 answers

How to access model microservice deployed behind Istio and Dex?

I built a deploy pipeline to serve ML models using Kubeflow (v0.6) and Seldon Core, but now that models are deployed I can't figure out how to pass the auth. layer and consume the services. My kubernetes instance is on bare-metal and setup is…
0
votes
1 answer

Failed to marshal the object to TFJob; the spec is invalid: failed to marshal the object to TFJob

i am rather new to both kubernetes and tensorflow, trying to run basic kubeflow distributed-tensorflow example from this link (https://github.com/learnk8s/distributed-tensorflow-on-k8s). I am currently running local bare-metal kubernetes cluster…
Ali Tariq
  • 63
  • 1
  • 8
0
votes
1 answer

How to include hyperparameter tuning in a TFX pipeline?

TFX pipeline is a really good tool for quick end-to-end model development. However, I'd also like to include hyperparameter tuning before final model training and evaluation. My question is whether there exists a best practice to include tuning in…
0
votes
1 answer

How can I retrieve the result of a trained model with Kubeflow Fairing?

I am using Kubeflow fairing to train a TensorFlow model on Kubernetes. The training succeeds but now I want to serve a prediction endpoint. How can I retrieve the saved TensorFlow session (weights, biases etc.) from the training step so that I can…
dippynark
  • 2,743
  • 20
  • 58
0
votes
1 answer

How do I specify/increase CPU usage for a TFjob served on KubeFlow?

I have a GKE setup running KubeFlow on the latest versions with Kustomize. The master TFJob pulls a Docker image of the full model and runs it. I'm running into a simple issue where I wish to increase the amount of CPU usage but can't seem to do…
0
votes
2 answers

Kubeflow Pipeline - Storing (passing) TF.Dataset

I am playing around with Kubeflow Pipelines, what I want to achieve is have one step (python function) where I create an Iterator (generator), from which I want to create a TF.Dataset Connections between Kubeflow steps are only allowed to have…
Josef Korbel
  • 1,168
  • 1
  • 9
  • 32
0
votes
1 answer

Distributed Tensorflow in Kubeflow - NotFoundError

I follow the tutorial for building kubeflow on GCP. At the last step, after deploying the code and training with CPU. kustomize build . |kubectl apply -f - The distributed tensorflow encounter this…
Jim
  • 1,550
  • 3
  • 20
  • 34
0
votes
1 answer

How to set local model repository - Tensorrt inference server with Minio

Hi I want to setup Kubeflow - NVIDIA TensorRT inference server with repository located in MinIO. I don't how to change gs://inference-server-model-store/tf_model_store to connect Minio. ks init my-inference-server cd my-inference-server ks registry…
1 2 3
28
29