Questions tagged [kubeflow]

Kubeflow Training Operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/Apache MXNet/XGBoost/MPI jobs on Kubernetes.

GitHub: https://github.com/kubeflow/training-operator

433 questions
2
votes
1 answer

microk8s not running after installation

I want to install kubeflow using microk8s on kubernetes cluster, but I faced a problem with microk8s. I already install microk8s using this link. So, when I tried to see the status on microk8s, it was said not running microk8s is not running. Use…
MADFROST
  • 1,043
  • 2
  • 11
  • 29
2
votes
0 answers

Kubeflow dashboard returns 403 Forbidden

I have a problem with Kubeflow Dashboard. Until now I could connect to the dashboard without problems, but after a restart of the PC it gives me forbidden when I try to connect from my browser to http://10.64.140.43.nip.io (this is the url received…
2
votes
0 answers

How to use Kubeflow volume mount with outputPath parameter?

I am building a Kubeflow pipeline that has 2 components. Component 1 preprocesses some data and component 2 performs model training on that data. I understand I need to save the data at some outputPath parameter generated by Kubeflow. This works. I…
Zach
  • 113
  • 1
  • 9
2
votes
0 answers

Kubeflow: Notebook server stuck on loading

Whenever I try to create a Kubeflow notebook server to build a pipeline from a jupyter notebook, it keeps loading forever without displaying any error. I'm currently using a Kubeflow dashboard that's already up and running on a server, so I didn't…
2
votes
3 answers

Kubeflow - error in create_run_from_pipeline_func

I'm new to Kubeflow and k8s. I have setup a single node k8s cluster and installed Kubeflow on this. I'm now trying the 'conditional pipeline' simple example from "Kubeflow for Machine Learning" book but I am getting "cannot post…
soumeng78
  • 600
  • 7
  • 12
2
votes
1 answer

istio-ingressgateway always Waiting for Istio Pilot information

I'm trying to deploy kubeflow on and OVH managed k8 cluster. After the initial setup of the k8 cluster, I ran the following commands to install kubeflow, as suggested here: # install snap install juju --classic # get cluster name (should be…
Preston
  • 7,399
  • 8
  • 54
  • 84
2
votes
2 answers

What is the best option for build kubeflow components?

I am read about Kubeflow, and for create components there are two ways. Container-Based Function-Based But there isn't an explication about why I should to use one or another, for example for load a Container-based, I need to generate a docker…
Tlaloc-ES
  • 4,825
  • 7
  • 38
  • 84
2
votes
1 answer

how to log metrics using kubeflow on google ai platform notebooks

I am building ml models using google cloud platform's ai platform notebooks. I know if I use ai platform jobs, it logs hyperparameters, metrics, etc with nice visualization but is there a way to create the same or similar structure so that I can log…
2
votes
0 answers

Is it possible to use artifacts as source for visualisations in Kubeflow pipelines

I'm experimenting with Kubeflow on minikube and I try to use the visualizations feature of the Kubeflow pipeline UI. The documentation states that you should generate a mlpipeline-ui-metadata.json file and add it to the ContainerOp outputs. This…
alberthier
  • 643
  • 1
  • 7
  • 9
2
votes
0 answers

Logout from Kubeflow application with Auth0 causing infinite loop

I am trying to setup authentication to Kubeflow with Auth0, following this manual: Authentication using OIDC (with the difference, I setup google account instead of github as a IdP) Now I am able to login with my Google account to kubeflow via auth0…
Vadim Yangunaev
  • 1,817
  • 1
  • 18
  • 41
2
votes
1 answer

Instantiate and Shutdown Kubeflow pods

I'm learning about Kubernetes and Kubeflow, and there's something that I want to do that I'm not finding any clear answer on the internet on if it's possible or the route I should take. When training my machine learning model, I want to use a large…
2
votes
0 answers

Add GPU to Kubeflow cluster on GKE

I am struggling to add a GPU to my GKE Kubeflow cluster. The documentation still references kfctl and some old set-up parameters. (To be precise, I added a T4 GPU to the GKE cluster successfully, but my notebook server fails to start). Has anyone…
OlgaPp
  • 180
  • 11
2
votes
1 answer

How to resolve the "ERROR No Major.Minor.Patch elements found" during ksonnet init step in AWS EKS setup

I'm following the official AWS EKS tutorial on setting up a distributed GPU cluster for Tensorflow model training and am hitting a bit of a snag. After creating a new cluster using eksctl and verifying that the corresponding ~/.kube/config file…
2
votes
2 answers

ParallelFor in Kubeflow Pipelines

I'd like to use a custom list to run parallel Ops in a Kubeflow Pipeline, and I want to use the value of the element of the list into the definition of the Op. I'm trying something like this: my_list = ['foo', 'bar'] with dsl.ParallelFor(my_list) as…
Matteo Felici
  • 1,037
  • 10
  • 19
2
votes
1 answer

Orchestrating TFX Pipelines with Kubeflow locally

Hey I am working on a package which generates a TFX Pipelines for training GPT-2 (see https://github.com/steven-mi/tfx-gpt2). I was wondering how I am able to deploy my pipeline to Kubeflow locally. Is there any in depth guide for doing so?
stmi
  • 21
  • 2