I have 2 examples of docker file and one is working and another is not. The main difference between the 2 is the base image.
Simple python base image docker file:
# syntax = docker/dockerfile:experimental
FROM python:3.9-slim-bullseye
RUN apt-get update -qy && apt-get install -qy \
build-essential tini libsasl2-dev libssl-dev default-libmysqlclient-dev gnutls-bin
RUN pip install poetry==1.1.15
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry config virtualenvs.create false
RUN --mount=type=cache,mode=0777,target=/root/.cache/pypoetry poetry install
Airflow base image docker file:
# syntax = docker/dockerfile:experimental
FROM apache/airflow:2.3.3-python3.9
USER root
RUN apt-get update -qy && apt-get install -qy \
build-essential tini libsasl2-dev libssl-dev default-libmysqlclient-dev gnutls-bin
USER airflow
RUN pip install poetry==1.1.15
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry config virtualenvs.create false
RUN poetry config cache-dir /opt/airflow/.cache/pypoetry
RUN --mount=type=cache,uid=50000,mode=0777,target=/opt/airflow/.cache/pypoetry poetry install
Before building the docker file run poetry lock
in the same folder as the pyproject.toml
file!
pyproject.toml
file:
[tool.poetry]
name = "Airflow-test"
version = "0.1.0"
description = ""
authors = ["Lorem ipsum"]
[tool.poetry.dependencies]
python = "~3.9"
apache-airflow = { version = "2.3.3", extras = ["amazon", "crypto", "celery", "postgres", "hive", "jdbc", "mysql", "ssh", "slack", "statsd"] }
prometheus_client = "^0.8.0"
isodate = "0.6.1"
dacite = "1.6.0"
sqlparse = "^0.3.1"
python3-openid = "^3.1.0"
flask-appbuilder = ">=3.4.3"
alembic = ">=1.7.7"
apache-airflow-providers-google = "^8.1.0"
apache-airflow-providers-databricks = "^3.0.0"
apache-airflow-providers-amazon = "^4.0.0"
pendulum = "^2.1.2"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
In order to build the images this is the command that I use:
DOCKER_BUILDKIT=1 docker build --progress=plain -t airflow-test -f Dockerfile .
For both images the first time they build poetry install
will need to download all dependencies. The interesting part is, the second time I build the image, the python-based image is a lot faster as the dependencies are already cached, but the airflow-based image will try and download all 200 dependencies once again.
From what O know by specifying --mount=type=cache
that directory will be stored in the image repository so it can be reused next time the image is build. By this you trim the final image size.
When running the image how do the dependencies appear? If I run docker run -it --user 50000 --entrypoint /bin/bash image
a simple python import is working on the airflow image but not on the python image. When and how will the dependencies be reattached to the image?
If you want to try it out, here is a dummy project that can be cloned locally and played around with: https://github.com/ioangrozea/Docker-dummy