Questions tagged [kubeflow]

Kubeflow Training Operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/Apache MXNet/XGBoost/MPI jobs on Kubernetes.

GitHub: https://github.com/kubeflow/training-operator

433 questions
3
votes
1 answer

Kubeflow jupyter notebook created, but no notebook resource

I'm new to Kubeflow and having hard time debugging. I created a notebook server with a custom dockerhub image, and when I click on 'connect', it says that there is no healthy upstream. In Home, I found that there exist activities, and it says…
Piljae Chae
  • 987
  • 10
  • 23
3
votes
3 answers

istio getting "RBAC: access denied" even the servicerolebinding checked to be allowed

I've been struggleing with istio... So here I am seeking help from the experts! Background I'm trying to deploy my kubeflow application for multi-tenency with dex. Refering to the kubeflow offical document with the manifest file from github Here is…
Roger Ray
  • 1,251
  • 2
  • 12
  • 20
3
votes
0 answers

How to create a Kubeflow component from a PyTorch job?

I've been ramping up on Kubeflow recently. My goal is to get PyTorch running in Kubeflow. I've gone through the documentation on creating a distributed PyTorch job here. I've also read through all the documentation on how to create pipelines /…
brenmcnamara
  • 459
  • 3
  • 10
3
votes
1 answer

Upload pipeline on kubeflow

I am currently trying to setup a kubeflow pipeline. My use case requires that the configuration for pipelines shall be provided via a yaml/json structure. Looking into the documentation for submitting pipelines I came across this paragraph: Each…
LexByte
  • 382
  • 4
  • 15
3
votes
1 answer

Can't connect to MiniKF landing page on http://10.10.10.10 after installing MiniKF

I am trying to run an example machine learning pipeline on premise (meaning: locally on a Windows 10 laptop) using MiniKF and Kubeflow Pipelines, following this tutorial, but I can't reach the site that should appear at http://10.10.10.10. I…
BioGeek
  • 21,897
  • 23
  • 83
  • 145
3
votes
2 answers

How to pass an environmental variable in kubeflow pipeline?

I want the variable to be accessed by gcr.io/******/serve_model:lat5 Image which is an argument of gcr.io/******/deployservice:lat2 Initially I have tried passing the variable as argument but it didn't work, so I am trying to pass it as an…
harish kumaar
  • 41
  • 1
  • 3
3
votes
1 answer

Which TFX orchestrator is de facto tandard for TFX?

I am beginner with tensorflow and now in a project where I need to deploy distributed production platform for tensorflow. I appreciate if I could get some help to clarify my thought. Reading the online doument, and youtube, I understood that main…
Yu Watanabe
  • 621
  • 4
  • 17
3
votes
2 answers

In Kubeflow Pipelines, how to send a list of elements to a lightweight python component?

I am trying to send a list of elements as a PipelineParameter to a lightweight component. Here is a sample that reproduces the problem. Here is the function: def my_func(my_list: list) -> bool: print(f'my_list is {my_list}') …
Kevin Pauli
  • 8,577
  • 15
  • 49
  • 70
3
votes
1 answer

How to schedule jobs in Kubeflow?

I'm setting up a Kubeflow cluster on AWS EKS, is there a native way in Kubeflow that allows us to automatically schedule jobs i.e. (Run the workflow every X hours, get data every X hours, etc.) I have tried to look for other things like Airflow, but…
3
votes
1 answer

Setting an active gcloud account in docker container

Currently i'm setting up a Kubeflow Pipeline on GKE. The goal is to start a trainingjob on the ML Engine and later on serve it on GKE. The trainingjob gets launched in a Docker container. (Every step in a pipeline must be a container.) I'm getting…
3
votes
1 answer

How to provide access to Kubeflow on GKE?

I have followed the steps in Kubernetes Engine for Kubeflow. The deployment went fine and all pods/services are up, including the endpoint at https://.endpoints..cloud.goog/, with the correct and of course. When I…
Azmi Kamis
  • 891
  • 5
  • 20
3
votes
3 answers

How to delete a Kubeflow cluster?

I tried to install KubeFlow but use the wrong region, how to delete it? I tried to do it from Kubernetes clsuter but keep getting the same error when I try to create a new one: Error 409: 'projects/dpe-cloud-mle/global/deployments/kubeflow'…
gogasca
  • 9,283
  • 6
  • 80
  • 125
2
votes
1 answer

why is kubeflow on vertex ai pipelines not storing metadata for dataset artifact

I am trying to pass metadata between python function components by attaching it an output artifact in a vertex ai kubeflow pipeline, from the documentation this seems straightforwards, but try as I might I can't get it to work. I am trying to attach…
2
votes
0 answers

mpi operator tensorflow benchmark example not starting

I'm trying to run this mpiJob example, https://github.com/kubeflow/mpi-operator/blob/master/examples/v2beta1/tensorflow-benchmarks/tensorflow-benchmarks.yaml by follwing the steps in this readme. I deployed the configuration to a local k3s cluster,…
lmln
  • 137
  • 1
  • 10
2
votes
0 answers

Unable to Connect Remote Spark Session with YARN mode on Kubeflow

The main problem is that we are unable to run spark in client mode. Whenever we try to connect to spark on YARN mode from kubeflow notebook we have the following error: `Py4JJavaError: An error occurred while calling o81.showString. :…
1 2
3
28 29