0

I'm trying to run a kubeflow pipeline setup and I have several environements (dev, staging, prod).

In my pipeline I'm using kfp.components.func_to_container_op to get a pipeline task instance (ContainerOp), and then execute it with the appropriate arguments that allows it to integrate with my s3 bucket:

from utils.test import test

test_op = comp.func_to_container_op(test, base_image='my_image')

read_data_task = read_data_op(
    bucket,
    aws_key,
    aws_pass,
)

arguments = {
    'bucket': 's3',
    'aws_key': 'key',
    'aws_pass': 'pass',
}
kfp.Client().create_run_from_pipeline_func(pipeline, arguments=arguments)

Each one of the environments is using different credentials to connect to it and those credentials are being passed in the function:

def test(s3_bucket: str, aws_key: str, aws_pass: str):
....
s3_client = boto3.client('s3', aws_access_key_id=aws_key, aws_secret_access_key=aws_pass)
s3_client.upload_file(from_filename, bucket_name, to_filename)

so for each environment I need to update the arguments to contain the correct credentials and it makes it very hard to maintain since each time that I want to update from dev to stg to prod I can't simply copy the code.

My question is what is the best approach to pass those credentials?

torpido
  • 63
  • 9
  • My ideal solution would be that those credentials should get initialized as env variables and that my pipeline will not be aware of them, that could have work if I could run the docker before the pipeline through an external process that will be aware of the env that I'm running on (dev/stg/prod) and then set those env variables before. Just a thought, although I hope there is a more elegant solution – torpido Apr 20 '20 at 23:13

1 Answers1

0

Ideally you should push any env-specific configurations as close to the cluster as possible (as far away from components).

You can create Kubernetes secret in each environemnt with different creadentials. Then use that AWS secret in each task:

from kfp import aws

def my_pipeline():
   ...

   conf = kfp.dsl.get_pipeline_conf()
   conf.add_op_transformer(aws.use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))

Maybe boto3 can auto-load the credentials using the secret files and the environment variables.

At least all GCP libraries and utilities do that with GCP credentials.

P.S. It's better to create issues in the official repo: https://github.com/kubeflow/pipelines/issues

Ark-kun
  • 6,358
  • 2
  • 34
  • 70