3

I want to load data from Google Storage to S3

To do this I want to use GoogleCloudStorageToS3Operator, which requires gcp_conn_id

So, I need to set up Google Cloud connection type

To do this, I added

apache-airflow[google]==2.0.2

to requirements.txt

but Google Cloud connection type is still not in Dropdown list of connections in MWAA enter image description here

Same approach works well with mwaa local runner

https://github.com/aws/aws-mwaa-local-runner

I guess it does not work in MWAA because of security reasons discussed here https://lists.apache.org/thread.html/r67dca5845c48cec4c0b3c34c3584f7c759a0b010172b94d75b3188a3%40%3Cdev.airflow.apache.org%3E

But still, is there any workaround to add Google Cloud connection type in MWAA?

Grish
  • 93
  • 6

3 Answers3

2

Connections can be created and managed using either the UI or environment variables.

To my understanding the limitation that MWAA have over installation of some provider packages are only on the web server machine which is why the connections are not listed on the UI. This doesn't mean you can't create the connection at all, it just means you can't do it from the UI.

You can define it from CLI:

airflow connections add [-h] [--conn-description CONN_DESCRIPTION]
                        [--conn-extra CONN_EXTRA] [--conn-host CONN_HOST]
                        [--conn-login CONN_LOGIN]
                        [--conn-password CONN_PASSWORD]
                        [--conn-port CONN_PORT] [--conn-schema CONN_SCHEMA]
                        [--conn-type CONN_TYPE] [--conn-uri CONN_URI]
                        conn_id

You can also generate a connection URI to make it easier to set.

Connections can also be set as environment variable. Example:

export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://?extra__google_cloud_platform__key_path=%2Fkeys%2Fkey.json&extra__google_cloud_platform__scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform&extra__google_cloud_platform__project=airflow&extra__google_cloud_platform__num_retries=5'

If needed you can check the google provider package docs to review the configuration options of the connection.

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
0

For MWAA there are 2 options to set connection:

  1. Setting environment variable. Using pattern AIRFLOW_CONN_YOUR_CONNECTION_NAME, where e.g. YOUR_CONNECTION_NAME = GOOGLE_CLOUD_DEFAULT. That can be done using custom plugin https://docs.aws.amazon.com/mwaa/latest/userguide/samples-env-variables.html
  2. Using secret manager https://docs.aws.amazon.com/mwaa/latest/userguide/connections-secrets-manager.html

Tested for google cloud connection, both are working.

Grish
  • 93
  • 6
0

I asked AWS support about this issue. Looks like they are working on it.

They told me a way to configure the the google cloud platform connection passing a json object in the extras with Conn Type as HTTP. And it works.

I have validated editing google_cloud_default (Airflow > Admin > Connections)

Conn Type: HTTP

Extra: { "extra__google_cloud_platform__project":"<YOUR_VALUE>", "extra__google_cloud_platform__key_path":"", "extra__google_cloud_platform__keyfile_dict":"{"type": "service_account","project_id": "<YOUR_VALUE>","private_key_id": "<YOUR_VALUE>", "private_key": "-----BEGIN PRIVATE KEY-----\n<YOUR_VALUE>\n-----END PRIVATE KEY-----\n", "client_email": "<YOUR_VALUE>", "client_id": "<YOUR_VALUE>", "auth_uri": "https://<YOUR_VALUE>", "token_uri": "https://<YOUR_VALUE>", "auth_provider_x509_cert_url": "https://<YOUR_VALUE>", "client_x509_cert_url": "https://<YOUR_VALUE>"}", "extra__google_cloud_platform__scope":"", "extra__google_cloud_platform__num_retries":"5" }

airflow conn screenshot

!! You must escape the " and /n in extra__google_cloud_platform__keyfile_dict !!

In requirements.txt I used: apache-airflow[gcp]==2.0.2

(I believe apache-airflow[google]==2.0.2 should work as well)

01Joan01
  • 1
  • 1