0

I've been recently refactoring a Dockerfile and decided to try ADD over RUN curl to make the file cleaner. To my surprise, this resulted in quite a size difference:

$ docker images | grep test
test    curl    3aa809928665   7 minutes ago    746MB
test    add     da152355bb4d   3 minutes ago    941MB

Even more surprisingly, I tried a few Dockerfiles that do nothing except ADDing or curling things, and their sizes are identical. I also tried with and without buildkit, the result is the same (although without buildkit images are slightly smaller).

Here's the actual Dockerfile

FROM ubuntu:22.04
 
ENV AWSCLI_VERSION "2.7.31"
ENV HELM_VERSION "3.9.4"
ENV OC_VERSION "4.11.5"
ENV VAULT_VERSION "1.11.3"
ENV YQ_VERSION "4.27.5"
ENV YQ_BINARY "yq_linux_amd64"
 
ENV DEBIAN_FRONTEND "noninteractive"
 
ADD "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}.zip" /extras/awscli.zip
ADD "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}.zip.sig" /extras/awscli.sig
ADD "https://get.helm.sh/helm-v${HELM_VERSION}-linux-amd64.tar.gz" /extras/helm.tgz
ADD "https://github.com/mikefarah/yq/releases/download/v${YQ_VERSION}/${YQ_BINARY}.tar.gz" /extras/yq.tgz
ADD "https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/${OC_VERSION}/openshift-client-linux.tar.gz" /extras/oc.tgz
ADD "https://releases.hashicorp.com/vault/${VAULT_VERSION}/vault_${VAULT_VERSION}_linux_amd64.zip" /extras/vault.zip
 
COPY aws-cli.pub /extras/aws-cli.pub
 
RUN cd /extras && \
    apt update && \
    apt install -y --no-install-recommends \
        ca-certificates \
        curl \
        gawk \
        gettext \
        git \
        gnupg2 \
        jq \
        openssh-client \
        unzip && \
    gpg --import /extras/aws-cli.pub && \
    # curl -L "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}.zip" -o /extras/awscli.zip && \
    # curl -L "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}.zip.sig" -o /extras/awscli.sig && \
    gpg --verify awscli.sig awscli.zip && \
    unzip -qq awscli.zip && \
    /extras/aws/install --update && \
    rm -rf /extras/aws* && \
    # curl -L "https://get.helm.sh/helm-v${HELM_VERSION}-linux-amd64.tar.gz" -o /extras/helm.tgz && \
    # curl -L "https://github.com/mikefarah/yq/releases/download/v${YQ_VERSION}/${YQ_BINARY}.tar.gz" -o /extras/yq.tgz && \
    # curl -L "https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/${OC_VERSION}/openshift-client-linux.tar.gz" -o /extras/oc.tgz && \
    # curl -L "https://releases.hashicorp.com/vault/${VAULT_VERSION}/vault_${VAULT_VERSION}_linux_amd64.zip" -o /extras/vault.zip && \
    find . -type f -name '*.tgz' -exec tar -xzf {} \; && \
    find . -type f -name '*.zip' -exec unzip -qq {} \; && \
    find . -type f -perm /101 -exec mv {} /usr/local/bin/ \; && \
    mv /usr/local/bin/${YQ_BINARY} /usr/local/bin/yq && \
    find /extras/ -mindepth 1 -delete && \
    apt clean && rm -rf /var/lib/apt/lists/*
 
ENTRYPOINT []

. I don't understand why this happens with this particular Dockerfile, because essentially I'm doing exactly the same things.

Any ideas?

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
John Doe
  • 3
  • 2
  • Unpredictability of operations involving outside input is part of why I stick to [Nix](https://nixos.org/): In the Nix world, all inputs coming from outside are content-addressed: if a build step can't provide an exact hash of its intended output it doesn't get internet access (and if it emits anything that doesn't match the given hash, its output is discarded); so you can't get variance between actual and expected behavior. (One can use Nix to build Docker images much more deterministically than Docker itself can). – Charles Duffy Sep 18 '22 at 17:09
  • ...in the Nix world, you can use `nix-diff` to ask what changed between two builds and get a far more comprehensive rundown than would otherwise be available. With Docker, you've now got a project -- and it's one that you really need to do yourself; _you_ can export tarballs of your two different builds and compare them; we can't do that for you. – Charles Duffy Sep 18 '22 at 17:12
  • Seems that you just ADD files which you'd remove in the same command (RUN) otherwise. They still exist in previous layers. – STerliakov Sep 18 '22 at 17:48
  • See [docs](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#add-or-copy) for reference. – STerliakov Sep 18 '22 at 17:49
  • @SUTerliakov it didn't occur to me for some reason that `ADD`s will live in image's layers... mind posting your comment as an answer? – John Doe Sep 18 '22 at 17:58

1 Answers1

0

You notice this, because ADDed files do not disappear from older image layers even if you remove them later. Consider the following dockerfiles:

# a
FROM alpine:latest
RUN apk add --no-cache curl

ADD https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz Python.tar.xz
RUN rm Python.tar.xz

# b
FROM alpine:latest
RUN apk add --no-cache curl

RUN curl -o Python.tar.xz https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz 
RUN rm Python.tar.xz

# c
FROM alpine:latest
RUN apk add --no-cache curl

RUN curl -o Python.tar.xz https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz && \
    rm Python.tar.xz

Building each of them in the same context, I got the following results:

REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
<none>       <none>    cc79832a5ffa   9 seconds ago    27.3MB
<none>       <none>    87ea16448764   13 seconds ago   7.68MB
<none>       <none>    7f794f03b960   18 seconds ago   27.3MB
alpine       latest    9c6f07244728   5 weeks ago      5.54MB

(guess which file yields different result)

If at some point you "finished" a layer with some files you don't need in final image - you wasted the space. So your single RUN command is the most efficient. To improve readability, you may try to adapt multi-stage build here, so that all curl/ADD, unzip/tar -x commands are isolated on build stage, and then you have only required binaries to copy from build stage to deploy stage. I'm not sure however that you'll gain much here.

STerliakov
  • 4,983
  • 3
  • 15
  • 37