0

I have a google data flow pipeline which I run from my local using the dataFlowRunner. However I have the GOOGLE_APPLICATION_CREDENTIALS stored in a file and I export it. I want to avoid storing the credentials for security reasons. Also I have saved the service-account with appropriate IAM roles in my properties file as well. Is there any way we can use Google security Manager? I have a use case to ensure code is credentials free.

1 Answers1

1

If you launch your Dataflow job from your local machine, your have to export the GOOGLE_APPLICATION_CREDENTIALS env var, unfortunately there is no choice in this case.

However if you launch your job via a DAG orchestrator like Airflow and Cloud Composer no need to pass a SA key file. The authentication is handled with Airflow and the SA used by Cloud Composer.

You can also explore other solutions with Cloud Shell or Cloud Build to launch your job, but I think it's better for the CI CD part to deploy the job and delegate the responsability of job execution to a pipeline orchestration tool like Airflow.

In production environment, you can also use Dataflow Flex Template to standardize the deployment of your Dataflow jobs based on a bucket and a Docker image.

In this case, if you use a tool like Cloud Build, no need to pass a Service Account key file.

You can check this article I written, that shows a complete example of CI CD pipeline with Dataflow Flex Template.

Mazlum Tosun
  • 5,761
  • 1
  • 9
  • 23
  • 1
    Yes, this is a good way. There's one alternative to GOOGLE_APPLICATION_CREDENTIALS that I wanted to bring up -- you can use your personal account permissions, if you authenticate through `gcloud auth application-default login`. Then there's no need for the environment variable. Check https://cloud.google.com/docs/authentication/application-default-credentials for more information. – Bruno Volpato Apr 03 '23 at 13:29
  • Thanks for the additional info Bruno, it's helpful :) – Mazlum Tosun Apr 03 '23 at 13:33