0

I have a Dockerfile like the following, app code is omitted:

FROM python:3
# Binary dependencies
RUN apt update && apt install -y gfortran libopenblas-dev liblapack-dev
# Wanted Python packages
RUN python3 -m pip install mysqlclient numpy scipy pandas matplotlib

It works fine but produces an image of 1.75 GB in size (while code is about 50 MB). How can I reduce such huge volume??

I also tried to use Alpine Linux, like this:

FROM python:3-alpine
# Binary dependencies for numpy & scipy; though second one doesn't work anyway
RUN apk add --no-cache --virtual build-dependencies \
    gfortran gcc g++ libstdc++ \
    musl-dev lapack-dev freetype-dev python3-dev
# For mysqlclient
RUN apk --no-cache add mariadb-dev
# Wanted Python packages
RUN python3 -m pip install mysqlclient numpy scipy pandas matplotlib

But Alpine leads to many different strange errors. Error from the upper code:

File "scipy/odr/setup.py", line 28, in configuration
    blas_info['define_macros'].extend(numpy_nodepr_api['define_macros'])
KeyError: 'define_macros'

So, how one can get minimal possible (or at least just smaller) image of Python 3 with mentioned packages?

AivanF.
  • 1,134
  • 2
  • 23
  • 52

1 Answers1

2

There are several things you can do to make your Docker image smaller.

  1. Use the python:3-slim Docker image as a base. The -slim images do not include packages needed for compiling software.
    • Pin the Python version, let's say to 3.8. Some packages do not have wheel files for python 3.9 yet, so you might have to compile them. It is good practice, in general, to use a more specific tag because the python:3-slim tag will point to different versions of python at different points in time.
  2. You can also omit the installation of gfortran, libopenblas-dev, and liblapack-dev. Those packages are necessary for building numpy/scipy, but if you install the wheel files, which are pre-compiled, you do not need to compile any code.
  3. Use --no-cache-dir in pip install to disable the cache. If you do not include this, then pip's cache counts toward the Docker image size.
  4. There are no linux wheels for mysqlclient, so you will have to compile it. You can install build dependencies, install the package, then remove build dependencies in a single RUN instruction. Keep in mind that libmariadb3 is a runtime dependency of this package.

Here is a Dockerfile that implements the suggestions above. It makes a Docker image 354 MB large.

FROM python:3.8-slim

# Install mysqlclient (must be compiled).
RUN apt-get update -qq \
    && apt-get install --no-install-recommends --yes \
        build-essential \
        default-libmysqlclient-dev \
        # Necessary for mysqlclient runtime. Do not remove.
        libmariadb3 \
    && rm -rf /var/lib/apt/lists/* \
    && python3 -m pip install --no-cache-dir mysqlclient \
    && apt-get autoremove --purge --yes \
        build-essential \
        default-libmysqlclient-dev

# Install packages that do not require compilation.
RUN python3 -m pip install --no-cache-dir \
      numpy scipy pandas matplotlib

Using alpine linux was a good idea, but alpine uses muslc instead of glibc, so it is not compatible with most pip wheels. The result is that you would have to compile numpy/scipy.

jkr
  • 17,119
  • 2
  • 42
  • 68
  • Are you sure? I tried two use the image just now, but `pip install mysqlclient` leads to `OSError: mysql_config not found`, and numpy leads to `RuntimeError: Broken toolchain: cannot link a simple C program` – AivanF. Oct 22 '20 at 17:15
  • Seems like mysqlclient is the issue. I don't see that issue with numpy... I don't think you're copying the dockerfile in my answer. – jkr Oct 22 '20 at 17:20
  • Right, I used first version of your code with `python:3-slim`, and the packages have successfully installed on the 3.8 version. Except the `mysqlclient` :( – AivanF. Oct 22 '20 at 17:27
  • 1
    Thanks! Your final Dockerfile works well, and resulted into 480 MB, more than 3 times smaller than my original version – AivanF. Oct 22 '20 at 18:02
  • Hi! Are you sure about removing system's mysqlclient after installing python mysqlclient? It seems to to work correctly: `NameError: name '_mysql' is not defined`. See more info [in this Q](https://stackoverflow.com/q/64623156/5308802). – AivanF. Oct 31 '20 at 19:34