4

I am trying to install dbt on Google Cloud Composer but run into dependency issues. I have followed the instructions from this article: https://blog.doit-intl.com/setup-dbt-with-cloud-composer-ab702454e27b however at step 2: installing the packages (airflow-dbt & dbt) in composer, it already fails.

I find the following in the Cloud build logs:

ERROR: snowflake-connector-python 2.3.6 has requirement boto3<1.16,>=1.4.4, but you'll have boto3 1.17.85 which is incompatible.
ERROR: snowflake-connector-python 2.3.6 has requirement requests<2.24.0, but you'll have requests 2.24.0 which is incompatible.
ERROR: networkx 2.5.1 has requirement decorator<5,>=4.3, but you'll have decorator 5.0.9 which is incompatible.
ERROR: hologram 0.0.13 has requirement jsonschema<3.2,>=3.0, but you'll have jsonschema 3.2.0 which is incompatible.
ERROR: dbt-core 0.19.1 has requirement idna<2.10, but you'll have idna 2.10 which is incompatible.
ERROR: dbt-core 0.19.1 has requirement requests<2.24.0,>=2.18.0, but you'll have requests 2.24.0 which is incompatible.
ERROR: dbt-snowflake 0.19.1 has requirement cryptography<4,>=3.2, but you'll have cryptography 3.0 which is incompatible.
ERROR: dbt-bigquery 0.19.1 has requirement google-api-core<1.24,>=1.16.0, but you'll have google-api-core 1.28.0 which is incompatible.
ERROR: dbt-redshift 0.19.1 has requirement boto3<1.16,>=1.4.4, but you'll have boto3 1.17.85 which is incompatible.

My current environment configuration contains: composer-1.13.0-airflow-1.10.12

Has anyone encountered the same problem and have you been able to solve it? I have also tried to install the specific versions of the requirements listed in the logs but this does not resolve the problem.

  • How are you installing this dependencies in your composer environment? In addition, could you try to create a new environment which will have the most updated version of Composer and try if it work? If it does, you can upgrade it as described [here](https://cloud.google.com/composer/docs/how-to/managing/upgrading). – Alexandre Moraes Jun 03 '21 at 10:06
  • 1
    @AlexandreMoraes even the upgraded Composer version works with required dbt pypi packages and its dependencies, you can't tell if the next version of Composer will break the dbt or dbt's dependencies or the next version of dbt will break the current version of Composer. The best way is to decouple the Composer environment and the environment for running dbt codes. – Ryan Yuan Jun 04 '21 at 06:49
  • Another option is workflows + cloud build https://stackoverflow.com/a/70134210/2346803 – robertsahlin Nov 27 '21 at 11:11

2 Answers2

8

It's a bit painful when you try to install dbt inside the Composer environment. However, there are workarounds for it.

  1. Using external services to run dbt jobs, e.g. Cloud Run.
  2. Using Composer's KubernetesPodOperator. My colleague has put up a nice article on dbt discourse here going through the setup process.
  3. Ignoring Composer's Dependency conflicts by setting Composer's environmental variable IGNORE_PYPI_DEPENDENCY_CONFLICTS to True. However, I don't recommend this as it may cause potential issues.
  4. Creating a Python virtual environment in Composer and install the dbt packages.
Ryan Yuan
  • 2,396
  • 2
  • 13
  • 23
  • Thanks a lot for these suggestions, I will look into it this week and try them out – Michel van Dijck Jun 07 '21 at 08:25
  • @MichelvanDijck did the above setup work out for you. – Urvah Shabbir Dec 24 '21 at 08:37
  • @RyanYuan How can you do the point 4? If I install `venv` package, how I can be able to create several virtual env and handle packages of each venv? –  Jan 06 '22 at 13:22
  • 1
    @martinus So the idea here is to use BashOperator to execute a bash script to create virtualenv in it. In my past experience, if I need to use multiple virtualenv, I had to use different BashOperators and create different virtualenvs. Once the job is done within a virtualenv, I will need to use the same bash script to remove it to save the memory/space from the Airflow instance. I had to put a random id as a suffix for each virtualenv for seperation so that I can prune the used one. – Ryan Yuan Jan 06 '22 at 22:19
  • 1
    Thanks @RyanYuan, it works well. At the end in my BashOperator, I have something like this : `python -m virtualenv -p python3 tmp_venv > /dev/null && source tmp_venv/bin/activate && pip install dbt-bigquery==0.20.2 > /dev/null && cd {{PATH_DBT}} && DBT_PROFILES_DIR=. dbt run && cd - && rm -rf tmp_venv` –  Jan 07 '22 at 08:09
0

You can have a try on using python package https://github.com/tomasfarias/airflow-dbt-python

Here is a blog with a example relying on it https://www.springml.com/blog/running-dbt-pipelines-via-cloud-composer/

(It seems the original block in the question has become 404 already.)

刘宇翔
  • 527
  • 8
  • 17