3

It used to take ~5 minutes for our Airflow deployment's docker image to build, and all of a sudden it is taking over an hour. With that said we haven't had to rebuild our image in a few months, so not sure when the issue came to be...

It looks like https://stackoverflow.com/questions/65122957/resolving-new-pip-backtracking-runtime-issue is the culprit. We're seeing a lot of warnings that look like this during build:

=> => #   Downloading google_cloud_os_login-2.3.1-py2.py3-none-any.whl (42 kB)                                                          
=> => # INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints             
=> => # to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press             
=> => # Ctrl + C.   
=> => #   Downloading google_cloud_os_login-2.2.1-py2.py3-none-any.whl (41 kB)                                                          
=> => #   Downloading google_cloud_os_login-2.2.0-py2.py3-none-any.whl (44 kB) 

Here is the line in our Dockerfile that is taking the hour+

RUN set -ex \
    && buildDeps=' \
        freetds-dev \
        libkrb5-dev \
        libsasl2-dev \
        libssl-dev \
        libffi-dev \
        libpq-dev \
        git \
    ' \
    && apt-get update -yqq \
    && apt-get install -yqq --no-install-recommends \
        $buildDeps \
        freetds-bin \
        build-essential \
        apt-utils \
        curl \
        rsync \
        netcat \
        locales \
    && sed -i 's/^# en_US.UTF-8 UTF-8$/en_US.UTF-8 UTF-8/g' /etc/locale.gen \
    && locale-gen \
    && update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
    && useradd -ms /bin/bash -d ${AIRFLOW_USER_HOME} airflow \
    && pip install -U pip setuptools wheel \
    && pip install pytz \
    && pip install pyOpenSSL \
    && pip install ndg-httpsclient \
    && pip install pyasn1 \
    && pip install apache-airflow[crypto,postgres,slack,kubernetes,gcp,docker,ssh]==${AIRFLOW_VERSION} \
    && if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \
    && apt-get purge --auto-remove -yqq $buildDeps \
    && apt-get autoremove -yqq --purge \
    && apt-get clean \
    && rm -rf \
        /tmp/* \
        /var/tmp/* \
        /usr/share/man \
        /usr/share/doc \
        /usr/share/doc-base \
        /var/lib/apt/lists/*

... 
...

COPY requirements.txt /requirements.txt
RUN pip install -r /requirements.txt

and here is our requirements.txt

google-cloud-core==1.4.1
google-cloud-datastore==1.15.0
gcsfs==0.6.1
flatten-dict==0.4.2
bigquery_schema_generator==1.4
backoff==1.11.1
six==1.13.0
ndjson==0.3.1
pymongo==3.1.2
SQLAlchemy==1.3.15
pandas==1.3.1
numpy==1.21.1
billiard

I am actually quite confused about this specific warning message referring to google_cloud_os_login because the build step that is hanging is the line I shared starting with RUN set -ex, which doesn't look to have any google cloud installations? We install some google cloud stuff via requirements.txt (-core, -datastore), but the lines to COPY and RUN pip install on requirements.txt are much lower in our dockerfile (as indicated by the ...). These warnings pop up for many libraries, however it does seem like this google_cloud_os_login is a major culprit taking a significant amount of time.

Where in the RUN set -ex ... command is it prompting to install google_cloud_os_login? And how can we set a specific version number on this library in order to speed up the build of this docker image?

Canovice
  • 9,012
  • 22
  • 93
  • 211

1 Answers1

1

I think the various google packages you're seeing are dependencies of apache-airflow[gcp].

To speed up the install, the documentation recommends you use one of the constraint files they provide. They create tags named constraints-<version> that contain files you can pass to pip with --constraint.

For example, when trying to install 2.2.0, there is a constraints-2.2.0 tag. In this tag's file tree, you'll see files like constraints-3.8.txt, where 3.8 is the python version I'm using.

pip install apache-airflow[gcp]==2.2.0 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.0/constraints-3.8.txt"
Rickkwa
  • 2,197
  • 4
  • 23
  • 34