I would like to use DBT in MWAA Airflow enviroment. To achieve this I need to install DBT in the managed environment and from there run the dbt commands via the Airflow operators or CLI (BashOperator).
My problem with solution is that I need store the dbt profile file(s) -which contains the target / source database credentials- in S3. Otherwise the file is not going to be deployed to the Airflow worker nodes hence cannot be used by dbt.
Is there any other option? I feel this is a big security risk and also undermines the use of Airflow (because I would like to use its inbuilt password manager)
My ideas:
- Create the profile file on the fly in the Airflow dag as a task and write it out to local. I do not think this is a feasible workaround, because there is no guarantee that the dbt task is going to run on the same worker node which my code created.
- Move the profile file manually to S3 (Exclude it from CI/CD). Again, I see a security risk, as I am storing credentials on S3.
- Create a custom operator, which builds the profile file on the same machine as command will run. Maintenance nightmare.
- Use MWAA environment variables (https://docs.aws.amazon.com/mwaa/latest/userguide/configuring-env-variables.html) and combine it with dbt's env_var command. (https://docs.getdbt.com/reference/dbt-jinja-functions/env_var) Storing credentials in System wide EVs, this way feels awkward.
Any good ideas or best practices?