1
#15 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

#15 google-cloud-aiplatform 1.16.1 requires google-cloud-bigquery<3.0.0dev,>=1.15.0, but you have google-cloud-bigquery 3.10.0 which is incompatible.

#15 google-ads 18.0.0 requires protobuf!=3.18.*,!=3.19.*,<=3.20.0,>=3.12.0, but you have protobuf 3.20.3 which is incompatible.

We are receiving these errors in the logs of docker-compose build when building our apache airflow image. According to LLM model:

  • The first conflict is between google-cloud-aiplatform and google-cloud-bigquery. The google-cloud-aiplatform library requires a version of google-cloud-bigquery that is less than 3.0.0dev and greater than or equal to 1.15.0, but you have google-cloud-bigquery version 3.10.0 installed which is incompatible.
  • The second conflict is between google-ads and protobuf. The google-ads library requires a version of protobuf that is less than or equal to 3.20.0 and greater than or equal to 3.12.0, excluding versions 3.18.* and 3.19.*, but you have protobuf version 3.20.3 installed which is incompatible.

It's worth noting that dbt-bigquery==1.5.0 is a new release from only a few weeks ago.

Here is our Dockerfile:

FROM --platform=linux/amd64 apache/airflow:2.5.3

# install mongodb-org-tools
USER root
RUN apt-get update && apt-get install -y gnupg software-properties-common && \
    curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
    add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
    apt-get update && apt-get install -y mongodb-org-tools
USER airflow

ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt

and our requirements.txt

gcsfs==0.6.1                        # Google Cloud Storage file system interface
ndjson==0.3.1                       # Newline delimited JSON parsing and serialization
pymongo==3.12.1                     # MongoDB driver for Python
dbt-bigquery==1.5.0                 # dbt adapter for Google BigQuery
numpy==1.21.1                       # Numerical computing in Python
pandas==1.3.1                       # Data manipulation and analysis library
billiard                            # Multiprocessing replacement, to avoid "daemonic processes are not allowed to have children" error using Pool

How can we resolve these dependency conflicts? How can we even tell which library dependencies are for which libraries in our requirements.txt? My assumption is that google-cloud-aiplatform and google-cloud-bigquery are both dependencies of dbt-bigquery, however if they were dependencies to the same library, I wouldn't except a dependency conflict.

Edit: some useful logs from the build:

Requirement already satisfied: protobuf>=3.18.3 in /home/airflow/.local/lib/python3.7/site-packages (from dbt-core~=1.5.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (3.20.0)

Collecting google-cloud-bigquery~=3.0
Downloading google_cloud_bigquery-3.10.0-py2.py3-none-any.whl (218 kB)

Requirement already satisfied: proto-plus<2.0.0dev,>=1.15.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.19.6)

Requirement already satisfied: grpcio<2.0dev,>=1.47.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.53.0)

Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.4.1)

Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.6.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.3.2)

Requirement already satisfied: google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.8.2)

Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in /home/airflow/.local/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.56.4)

Requirement already satisfied: grpcio-status<2.0dev,>=1.33.2 in /home/airflow/.local/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.48.2)

Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5))

google-cloud-aiplatform and google-ads do not appear a single time in the build logs other than in the error message.

Canovice
  • 9,012
  • 22
  • 93
  • 211

1 Answers1

1

The problem arises from conflicts with Python packages the OS requests to install and the dependency graph of your project's packages.

The short answer is to use the same strategy as you often would with any Python project: venv

Solution

Below is a complete working Dockerfile:

FROM --platform=linux/amd64 apache/airflow:2.5.3-python3.9

# install mongodb-org-tools
ENV DEBIAN_FRONTEND noninteractive
USER root
RUN apt-get update && apt-get install -y --no-install-recommends gnupg software-properties-common python3-venv && \
    curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
    add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
    apt-get update && apt-get install -y --no-install-recommends mongodb-org-tools

COPY requirements.txt /usr/local/app/requirements.txt

ENV VIRTUAL_ENV=/usr/local/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

RUN \
    pip install --upgrade --no-cache-dir --no-user pip && \
    pip install --no-cache-dir --no-user -r /usr/local/app/requirements.txt
    # run your app

Note the setup an use of venv here. Just like outside a container, this will partition your application dependencies from the system-installed one inside the container.

Notes

  • In this sample I have used root user as the permissions issue was getting annoying. In your production file you'll want to use COPY chown... and put things in place with appropriate USER permissions.

  • /usr/local/app/ is just my paradigm. You can put the files anywhere.

  • Because we are rewriting the $PATH instead of using activate for the venv, you have to tell pip --no-user.

  • At first trying --no-install-recommends in apt-get install to see if the affected dependency would be excluded. However, I left it in there as it's good practice and minimize your image size.

Detail

When running apt-get install you can see a number of packages are installed:

#6 4.003 The following NEW packages will be installed:
#6 4.003   dbus dmsetup gir1.2-glib-2.0 gir1.2-packagekitglib-1.0 iso-codes
#6 4.003   libapparmor1 libappstream4 libargon2-1 libcap2 libcap2-bin libcryptsetup12
#6 4.003   libcurl3-gnutls libdbus-1-3 libdevmapper1.02.1 libdw1 libelf1
#6 4.003   libgirepository-1.0-1 libglib2.0-0 libglib2.0-bin libglib2.0-data
#6 4.003   libgstreamer1.0-0 libip4tc2 libkmod2 libnss-systemd libpackagekit-glib2-18
#6 4.003   libpam-cap libpam-systemd libpolkit-agent-1-0 libpolkit-gobject-1-0
#6 4.003   libstemmer0d libunwind8 libyaml-0-2 packagekit packagekit-tools policykit-1
#6 4.003   python-apt-common python3-apt python3-dbus python3-distro-info python3-gi
#6 4.003   python3-pycurl python3-software-properties shared-mime-info
#6 4.003   software-properties-common systemd systemd-sysv systemd-timesyncd ucf
#6 4.003   unattended-upgrades xdg-user-dirs xz-utils
...
#7 6.657 The following NEW packages will be installed:
#7 6.657   dbus dmsetup gir1.2-glib-2.0 gir1.2-packagekitglib-1.0 iso-codes
#7 6.657   libapparmor1 libappstream4 libargon2-1 libcap2 libcap2-bin libcryptsetup12
#7 6.657   libcurl3-gnutls libdbus-1-3 libdevmapper1.02.1 libdw1 libelf1
#7 6.657   libgirepository-1.0-1 libglib2.0-0 libglib2.0-bin libglib2.0-data
#7 6.657   libgstreamer1.0-0 libip4tc2 libkmod2 libnss-systemd libpackagekit-glib2-18
#7 6.657   libpam-cap libpam-systemd libpolkit-agent-1-0 libpolkit-gobject-1-0
#7 6.657   libstemmer0d libunwind8 libyaml-0-2 packagekit packagekit-tools policykit-1
#7 6.657   python-apt-common python3-apt python3-dbus python3-distro-info python3-gi
#7 6.657   python3-pycurl python3-software-properties shared-mime-info
#7 6.657   software-properties-common systemd systemd-sysv systemd-timesyncd ucf
#7 6.657   unattended-upgrades xdg-user-dirs xz-utils
#7 6.658 The following packages will be upgraded:
#7 6.658   libsystemd0

I didn't track down the exact problem package, but you can see several python3-* packages requested to be installed. One of these conflicts with the dependency graph of your application.

Jeffrey Mixon
  • 12,846
  • 4
  • 32
  • 55
  • I see. By adding it to the Dockerfile, this example seems to run all of airflow in a virtual env. Is this better or worse than using the `pythonVirtualenvOperator` in the specific DAGs where library dependencies are ocming into play? – Canovice May 11 '23 at 03:37
  • thank you for the detailed response here. – Canovice May 11 '23 at 03:37
  • 1
    @Canovice I really can't speak to the benefits of using a DAG in Airflow for dependency management as I don't believe I ever have done it. However, just on principle imagine you were building any other Python application that didn't have the luxury of a DAG library. It seems like to me the clearest and simplest approach would be to use `venv`--It doesn't require any real "code" and pretty much every engineer will understand it immediately. If your own application dependencies really start to conflict, then I think it starts making more sense to define them in a DAG. – Jeffrey Mixon May 11 '23 at 07:27
  • Makes sense. I don't use virtual envs, don't code in python too often, but I am reading on it today. I still don't understand source of the conflict in my post though. `dbt-bigquery` has a conflict not with another library from `requirements.txt`, but a conflict with the general python version installed in the airflow image? – Canovice May 11 '23 at 14:21
  • 1
    @Canovice it's a conflict between a system Python package installed likely as a dependency of `airflow`. Specifically the `protobuf` package seems to be the problem as airflow needs a fairly narrow version and your `google*` dependencies want versions that don't fall in the range. This is a very common problem with Python projects which is why `venv` is so heavily used. – Jeffrey Mixon May 11 '23 at 20:09
  • and using `venv`s isolate the system python package dependencies from the library versions in `venv` ? – Canovice May 12 '23 at 02:13