5

In Dockerfiles I'm seeing most people using this syntax

RUN apt-get -y update \
    && apt-get install -y libicu-dev

over this one

RUN apt-get -y update
RUN apt-get install -y libicu-dev

For me the first one gets only one line (layer) cached while the second caches both (am I wrong ?) and stops as soon as a command is not successful.

Besides I don't find the first one more readable.

So why would we use the first syntax ?

Pierre de LESPINAY
  • 44,700
  • 57
  • 210
  • 307
  • This is a must read https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#run – yamenk Nov 01 '17 at 12:13

6 Answers6

3

It is optimisation for docker image layer. I also recommend to read Best practices for writing Dockerfiles

There is also interesting presentation from DockerCon EU 2017.

1

Lesser the layers, better the image.

Hence, combining commands using && will create a single layer.

Having two RUN will create two layers.

Abhay Dandekar
  • 1,230
  • 10
  • 30
  • Why less layers, better image ? If I modify something in a big layer, I'd have to rebuild whole layer. Disk space should not really be impacted here since it is making a diff anyway. – Pierre de LESPINAY Nov 01 '17 at 09:56
  • 1
    Images are like virtual file system layers. Basically, a layer, or image layer is a change on an image, or an intermediate image. Every command you specify (FROM, RUN, COPY, etc.) in your Dockerfile causes the previous image to change, thus creating a new layer. You can think of it as staging changes when you're using git: You add a file's change, then another one, then another one. Hence, lesser the better. – Abhay Dandekar Nov 01 '17 at 10:23
  • A layer is a kind of "patch" which contains only differences from the previous one. So there won't be a big gap in terms of disk space used unless we are running commands that reverts entirely the previous layers. – Pierre de LESPINAY Nov 01 '17 at 10:32
  • 2
    Yes, agree. Hence all logically similar commands should form a single layer. So, in this particular case, update and install should form a single layer. – Abhay Dandekar Nov 01 '17 at 10:56
  • Yes that's the issue that I understand now – Pierre de LESPINAY Nov 01 '17 at 10:59
1

According to the images and layers documentation

Each layer is only a set of differences from the layer before it

So for example 2 layers creating different files would not use more disk space. Especially since Docker 17.05 allows multi-stage builds. However, it still could use more space if the second one is entirely modifying files from the first one.

Following Khapov Igor's comment I found out the real answer to the original question in the best practice doc:

Using apt-get update alone in a RUN statement causes caching issues and subsequent apt-get install instructions fail.

It's actually more about layer dependencies with previous commands for which results can evolve over time like apt-get update.

That's why they are telling:

Always combine RUN apt-get update with apt-get install in the same RUN statement

Pierre de LESPINAY
  • 44,700
  • 57
  • 210
  • 307
0

Each command in a Dockerfile creates another image layer.

Combining commands is a way to end up with less layers overall.

See https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/#images-and-layers

Peter Walser
  • 15,208
  • 4
  • 51
  • 78
0

This line:

RUN apt-get -y update \
&& apt-get install -y libicu-dev

will create one single docker layer and these lines:

RUN apt-get -y update
RUN apt-get install -y libicu-dev

will create two different layers.

This is the main reason why when you need to install something in your docker machine (ex: via APT) you tend to keep everything in one single line (aka layer)

db80
  • 4,157
  • 1
  • 38
  • 38
0

As the other answers already said, every command generates a layer and it's usually desirable to have the minimum amount of layers per image.

Each layer is only a set of differences from the layer before it. The layers are stacked on top of each other. When you create a new container, you add a new writable layer on top of the underlying layers.

This means that unless you're going to "squash" your image (which translates in using the --squash option during the build), you end up having an image consuming space for nothing.

Example

# Dockerfile
FROM ubuntu

RUN apt-get update
RUN apt-get install -y --no-install-recommends dnsutils
RUN echo $( dig somewhere.nowhere )
RUN apt-get remove --purge dnsutils
RUN rm -rf /var/lib/apt/lists/*

COPY magicalScript.sh /
CMD /magicalScript.sh

In this case you'll have layers containing only overhead:

  • 1 with cache coming from apt-get update
  • 1 with dnsutils installed,
  • 1 containing the removal of the dnsutils
  • 1 containing the removal of the cache

The problem is that all those layers remain there and consume space for no reason at all.

Why squash is not always a good solution? Because the layers represents a cache as well. And it's extremely useful when you need to perform a lot of builds and you need them to be as fast as possible.

Usually it's good practice to group together operation related the installation of new packages on the OS:

# Dockerfile

FROM ubuntu

RUN useradd docker \
    && mkdir /home/docker \
    && chown docker:docker /home/docker \
    && addgroup docker staff

RUN apt-get update \ 
    && apt-get install -y --no-install-recommends ed less locales vim-tiny wget ca-certificates fonts-texgyre \
    && rm -rf /var/lib/apt/lists/*

RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
    && locale-gen en_US.utf8 \
    && /usr/sbin/update-locale LANG=en_US.UTF-8

CMD ["mySpecialCommand"]
Stefano
  • 4,730
  • 1
  • 20
  • 28
  • Yes layers remains there, and I'm very grateful for that since I heavily use the cache system. The thing is multiple layers don't necessarily take more disk space than one. – Pierre de LESPINAY Nov 01 '17 at 10:51
  • The overhead is not very significant until you decide to remove stuff. What you need to keep in mind is that for each layer generated your build has to create an intermediate container that gets deleted in the end of the process. Docker uses a union fs. I suggest to give a check to this question: https://stackoverflow.com/questions/32775594/why-does-docker-need-a-union-file-system#32776204 – Stefano Nov 01 '17 at 11:36