7

I am trying to setup a Google Cloud Platform connection in Google Cloud Composer using the service account key. So I created a GCS bucket and put the service account key file in the bucket. The key is stored in JSON. In the keyfile path field I specified a GCS bucket, and in the keyfile JSON field I specified the file name. The scopes is https://www.googleapis.com/auth/cloud-platform.

When trying to use this connection to start a Dataproc cluster, I got the error that JSON file can not be found.

Looking at the error message, the code tries to parse the file using: with open(filename, 'r') as file_obj which obviously won't work with a GCS bucket path.

So my question is, where should I put this service account key file if it can not be put in a GCS path?

Jake Biesinger
  • 5,538
  • 2
  • 23
  • 25
Charles Zhan
  • 225
  • 2
  • 3
  • 13

6 Answers6

4

I'm assuming you want your operators using a service account distinct from the default auto-generated compute account that Composer runs under.

The docs indicate that you can add a new Airflow Connection for the service account, which includes copy-pasting the entire JSON key file into the Airflow Connection config (look for Keyfile JSON once you select the Google Cloud Platform connection type).

Jake Biesinger
  • 5,538
  • 2
  • 23
  • 25
2

This doesn't make much sense to me. I don't use Google Cloud so maybe just my lack of knowledge here:

If you're trying to set up a connection to GCP how can you store your credentials inside GCP and expect to connect from your airflow server? Chicken and egg thing.

Looking at the gcp_api_base_hook.py in the airflow repo it looks like it is expecting you to specify key_path and / or a keyfile_dict in the extra json properties of the connection and the logic to how to connect is here

Simon D
  • 5,730
  • 2
  • 17
  • 31
  • Google beta-released its managed cloud composer service, which is built on Airflow. You can basicly have Airflow running on GCP very easily now. When you set it up, google designated a folder on GCP Storage to store put the dags file. So when the Composer(Airflow) is setup in one project, it has access rights to the services on this particular project by default, including access the storage, start dataproc cluster, etc. But I need the Airflow to access resources on other GCP projects, a standard way of doing it is through service accounts. – Charles Zhan Jun 06 '18 at 16:00
  • Ohh I see now, what happens if you take out the `key_path` and `keyfile_dict` from the extra json properties? It looks like the gcp hook uses the default credentials on the machine itself which should automatically be set by GCP right? – Simon D Jun 06 '18 at 16:35
  • The service account key file is needed for authentication to access resources to other projects. When I am using Airflow, I can put the key file under some directories on the machine when Airflow is installed. When using Cloud Composer, i assume those key files can be put under GCS path (so I don't need to ssh into the machines composer/airflow is installed, I guess that is the purpose of "managed service ?), but it is not working. – Charles Zhan Jun 06 '18 at 17:12
  • where you able to figure out how to make a service account file available to a composer node? – Andres Lowrie Mar 23 '21 at 20:50
1

Add the following to your Extra's field:

'{"extra__google_cloud_platform__scope":"https://www.googleapis.com/auth/cloud-platform", "extra__google_cloud_platform__project":"{GOOGLE_CLOUD_PROJECT}", "extra__google_cloud_platform__key_path":"/path/to/gce-key.json"}'
rohit thomas
  • 2,302
  • 11
  • 23
Chris DeBracy
  • 33
  • 1
  • 6
  • 1
    How can you tell which keys to set at "extra__google_cloud_platform__{KEY}"? It is not obvious based on the UI for connections at all; the UI field "Project ID" seems to be mapped to the key "project" for example. Is it documented somewhere? – knowa42 Nov 09 '18 at 20:10
0

Cloud Composer should set up a default connection for you that doesn't require you specify the JSON key. For me it worked for GCS and BigQuery without doing any additional work.

If you create your own service account, then copy the JSON key to the composer bucket that gets created. That file/path is what you'll use in the extras field. I think Composer prefixes the file system using a gs: or gcs: mount point. There should be a reference to it in the Airflow.cfg file that's in the bucket.

I don't have one spun up right this moment to tell you for certain so I'm working from memory.

Chris DeBracy
  • 33
  • 1
  • 6
0

Since Cloud Composer live in a GKE Cluster, you could set your service account as a kubernetes secret, then you should be able to use it in conjunction with Kubernetes Operator

Gongora
  • 595
  • 6
  • 10
0

Composer instance creates a GCS bucket where all the dags and plugins are stored. You need to keep the json file in data folder and then give the path as the mapped location ex '/home/airflow/gcs/data/<service.json>' . For more details refer to link - https://cloud.google.com/composer/docs/concepts/cloud-storage