0

We have an s3 bucket that Airflow uses as the source of all of our DAG data pipelines. We have a bucket for dev, test and production. Let's say the bucket in dev is called dev-data-bucket, in test it's called test-data-bucket etc.

I don't want to manually specify the bucket name in our DAG code because this code gets migrated between environments. If I manually specify the dev-data-bucket in our dev environment and this DAG code goes to our test environment, the bucket name would need to change to test-data-bucket and prod-data-bucket for prod.

I understand that the usual way to do this would be to create an Airflow connection in each environment which has the same name like data-bucket. However, I don't know where to specify the bucket name in the connection screen for airflow like I would for a database connection?

How do I create an Airflow s3 connection with the same name in each environment but which specifies a different bucket name for each environment?

Simon D
  • 5,730
  • 2
  • 17
  • 31

1 Answers1

1

The way to do this is to specify the bucket name in the Schema field in the Airflow connection screen. This is what the connection screen would look like in dev:

enter image description here

Then, when you use one of the provided s3 operators in airflow you don't need to specify a bucket name because the s3 hook in Airflow is setup to fetch the bucket name from the connection if you haven't specified one.

Simon D
  • 5,730
  • 2
  • 17
  • 31