0

I'm a newbie on Kubeflow, just started exploring. I've setup a microk8s cluster and charmed kubeflow. I have executed a few examples trying to understand the different components. Now I'm trying to setup a pipeline from scratch for a classification problem. The problem that I'm facing is with handling the download of data.

Could anyone please point me to an example where data (preferably images) is downloaded from an external source? All the examples that I can find are based on snakk datasets from sklearn or mnist etc. I'm rather looking for an example using a real world (or near to) data, example

 https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip

Thanks in advance for any direction.

Tried exploring multiple kubeflow examples, blogs etc to find an example that contains real data rather than toy dataset. I couldn't find one.

I've found some jupyter notebook examples that use !wget to download in the notebook kernel, but I couldnt find how that can be converted to a kubeflow op step. I presumed func_to_container_op wouldn't work for such a scenario. As a next step I'm going to try using specs.AppDef from torchx to download. As I'm a total newbie, I wanted to make sure if I'm in the right direction.

Govi
  • 1
  • Hi, I was able to download using wget for direct links and also I've explored PodDefault configuration to download datasets that require authentication (like kaggle competition datasets). Now, I've one related question on this. I configured kaggle creds as k8s secret. With this, the download works fine on a jupyternote as I create the notebook with the configuration selected. But when I trigger a run from the notebook, it creates a new container and this container doesn't have the /secret/ mount attached to it. Is there a recommended approach to give access for secrets to new containers? – Govi Dec 22 '22 at 12:21

1 Answers1

0

I was able to download using wget for direct links and also I was able to configure k8s secrets and patch the serviceaccount with ImagePullSecret to get the downloads done from newly created containers.

Govi
  • 1