2

I've created a microsservice (https://github.com/staticdev/enelvo-microservice) that needs to clone a git repository to create a docker image, with a single stage Dockerfile the final image has 759MB:

FROM python:3.7.6-slim-stretch

# set the working directory to /app
WORKDIR /app

# copy the current directory contents into the container at /app
COPY . /app

RUN apt-get update && apt-get install -y git \
 && pip install -r requirements.txt \
 && git clone https://github.com/tfcbertaglia/enelvo.git enelvo-src \
 && cd enelvo-src \
 && python setup.py install \
 && cd .. \
 && mv enelvo-src/enelvo enelvo \
 && rm -fr enelvo-src

EXPOSE 50051

# run app.py when the container launches
CMD ["python", "app.py"]

I've tried the approach of using a multistage build (https://blog.bitsrc.io/a-guide-to-docker-multi-stage-builds-206e8f31aeb8) to reduce the image size without git and apt-get lists (from update):

FROM python:3.7.6-slim-stretch as cloner

RUN apt-get update && apt-get install -y git \
 && git clone https://github.com/tfcbertaglia/enelvo.git enelvo-src

FROM python:3.7.6-slim-stretch

COPY --from=cloner /enelvo-src /app/enelvo-src

# set the working directory to /app
WORKDIR /app

# copy the current directory contents into the container at /app
COPY . /app

RUN pip install -r requirements.txt \
 && cd enelvo-src \
 && python setup.py install \
 && cd .. \
 && mv enelvo-src/enelvo enelvo \
 && rm -fr enelvo-src

EXPOSE 50051

# run app.py when the container launches
CMD ["python", "app.py"]

The problem is that, after doing that, the final size got even bigger (815MB). Any idea of what could be wrong in this case?

torek
  • 448,244
  • 59
  • 642
  • 775
staticdev
  • 2,950
  • 8
  • 42
  • 66
  • can you add `&& rm -rf /var/lib/apt/lists/*` to the end of your `apt-get` statement to clean out the apt cache, that should decrease the size of the image. also you should logically group actions into layers. running `apt-get` commands and installing python libs in one command doesn't make sense – gold_cy Feb 17 '20 at 21:46
  • 1
    You should try buildkit, will reduce your built a lot: `export DOCKER_BUILDKIT=1` – J-Jacques M Feb 17 '20 at 23:12
  • Thanks for the suggestions @gold_cy abd @jean-jacques-moiroux! Could you explain how could the second dockerfile generate a bigger image? I still can't imagine why. – staticdev Feb 17 '20 at 23:32
  • 1
    This has nothing to do with Git, so I've removed that tag. The docker image size is based on how much data Docker has to save. Note that for restarting from each step, Docker saves all modified file system files after running each command line. – torek Feb 17 '20 at 23:54

1 Answers1

2

In you're first example you're running

RUN git clone https://github.com/tfcbertaglia/enelvo.git enelvo-src \
    ... \
 && rm -fr enelvo-src

and so the enelvo-src tree never exists outside this particular RUN instruction; it's deleted before Docker can build a layer out of it.

In the second example you're running

COPY --from=cloner /enelvo-src /app/enelvo-src
RUN rm -fr enelvo-src

Docker internally creates an image layer after the first step that contains the content of that source tree. The subsequent RUN rm doesn't actually make the image smaller, it just records that the content that was there from the earlier layer technically isn't part of the filesystem any more.

Generally the standard way to use a multi-stage build is to to as much building as you actually can in the earlier stage, and only COPY a final result into your runtime image. For Python packages, one approach that can work well is to build a wheel out of the package:

FROM python:3.7.6-slim-stretch as build
WORKDIR /build
RUN apt-get update && apt-get install -y git \
 && git clone https://github.com/tfcbertaglia/enelvo.git enelvo-src
 && ...
 && python setup.py bdist_wheel  # (not "install")

FROM python:3.7.6-slim-stretch
WORKDIR /app
COPY --from=build /build/dist/wheel/enelvo*.whl .
RUN pip install enelvo*.whl
...
David Maze
  • 130,717
  • 29
  • 175
  • 215
  • Thanks a lot for your answer. It explains a lot. Unfortunately, `COPY --from=build /build/dist/wheel/enelvo*.whl .`, from logs I see it is using `build/bdist.linux-x86_64/wheel` but that didn't work either. I get the error: `COPY failed: no source files were specified`. – staticdev Feb 18 '20 at 11:56
  • UPDATE: Actually, there is no /build folder after the RUN command. With the Buildkit I was able to see the message `failed to solve with frontend dockerfile.v0: failed to build LLB: lstat /var/lib/docker/overlay2/wb6jb8vrrhowlaarlj0p2aawx/merged/build/dist/wheel: no such file or directory`. – staticdev Feb 18 '20 at 12:36
  • It worked with a couple changes: WORKDIR /app COPY --from=build /enelvo-src/dist/Enelvo*.whl /app and Enelvo (with capital E) – staticdev Feb 18 '20 at 16:59