43

In the quest for ever smaller Docker images, it's common to remove the apt (for Debian/Ubuntu based images) cache after installing packages. Something like

RUN rm -rf /var/lib/apt/lists/*

I've seen a few Dockerfiles where this is done after each package installation (example), i.e. with the pattern

# Install some package
RUN apt-get update \
    && apt-get install -y <some-package> \
    && rm -rf /var/lib/apt/lists/*

# Do something
...

# Install another package
RUN apt-get update \
    && apt-get install -y <another-package> \
    && rm -rf /var/lib/apt/lists/*

# Do something else
...

Are there any benefits of doing this, rather than only cleaning the apt cache at the very end (and thus only updating it once at the beginning)? To me it seems like having to remove and update the cache multiple times just slows down the image build.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
jmd_dk
  • 12,125
  • 9
  • 63
  • 94

1 Answers1

54

The main reason people do this is to minimise the amount of data stored in that particular docker layer. When pulling a docker image, you have to pull the entire content of the layer.

For example, imagine the following two layers in the image:

RUN apt-get update
RUN rm -rf /var/lib/apt/lists/*

The first RUN command results in a layer containing the lists, which will ALWAYS be pulled by anyone using your image, even though the next command removes those files (so they're not accessible). Ultimately those extra files are just a waste of space and time.

On the other hand,

RUN apt-get update && rm -rf /var/lib/apt/lists/*

Doing it within a single layer, those lists are deleted before the layer is finished, so they are never pushed or pulled as part of the image.

So, why have multiple layers which use apt-get install? This is likely so that people can make better use of layers in other images, as Docker will share layers between images if they're identical in order to save space on the server and speed up builds and pulls.

Ben XO
  • 1,089
  • 1
  • 11
  • 16
  • 7
    Does that mean that having a separate, final step with `RUN rm -rf /var/lib/apt/lists/*` doesn't actually shrink the final image, as the APT cache still exists for the previous layers? – jmd_dk May 24 '20 at 20:43
  • 5
    A standalone `RUN rm ...` step does not actually make the image any smaller. – David Maze May 25 '20 at 00:15
  • Yes, that's correct. The only benefit to a final step `RUN rm -rf …` is that those files are not available inside the container if that makes the environment "cleaner", but it does NOT shrink the final image. To shrink the image you need to remove the files in the same layer they were created (so, in the same RUN). – Ben XO May 25 '20 at 13:34
  • 2
    https://docs.docker.com/develop/develop-images/dockerfile_best-practices/ mentions "Official Debian and Ubuntu images automatically run `apt-get clean`, so explicit invocation is not required." - is that orthogonal to `rm -rf /var/lib/apt/lists/*`? – Anon Oct 23 '20 at 15:58
  • 3
    no; `apt-get clean` removes the installer files which are downloaded to install the packages. `rm -rf /var/lib/apt/lists/*` removes the lists which are used to figure out which packages are available to install. To be honest, the removal of those lists hardly saves any space, but it does seem like the right thing to do if you want to make the smallest image, especially if you are going to publish it for other people to use. – Ben XO Oct 25 '20 at 21:27
  • 4
    Another reason for multiple `apt-get install` layers is for debugging the build process. If you're trying to install 100 packages, and keep getting one wrong, you'll have to try again with all 100 packages. Splitting the build into two or more `RUN`s will let the first series of installed be cached as an image while you debug the second series. Once it all works, you can merge it together if you want. – Daniel Griscom Jul 23 '21 at 12:39
  • 1
    In newer version of Docker, since Buildkit was introduced, you can now use --mount options to cache the apt directories on host between layers, without building them into the image. – HTE Jun 14 '23 at 06:30