8

My understanding is that if the RUN command "string" itself just does not change (i.e., the list of packages to be installed does not change), docker engine uses the image in the cache for the same operation. This is also my experience:

...
Step 2/6 : RUN apt update &&      DEBIAN_FRONTEND=noninteractive     apt install -y     curl               git-all            locales            locales-all        python3            python3-pip        python3-venv       libusb-1.0-0       gosu        &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 518e8ff74d4c
...

However, the official Dockerfile best practices document says this about apt-get:

Using RUN apt-get update && apt-get install -y ensures your Dockerfile installs the latest package versions with no further coding or manual intervention. This technique is known as “cache busting”.

This is true if I add a new package to the list but it is not if I do not modify the list.

Is my understanding correct, or I am missing something here?

If yes, can I assume that I will only get newer packages in apt-get install if also the Ubuntu base image has been updated (which invalidates the whole cache)?

Tibor Takács
  • 3,535
  • 1
  • 20
  • 23

2 Answers2

3

You cut off the quote in the middle. The rest of the quote included a very important condition:

You can also achieve cache-busting by specifying a package version. This is known as version pinning, for example:

RUN apt-get update && apt-get install -y \
    package-bar \
    package-baz \
    package-foo=1.3.*

Therefore the command you run in there example would change each time by changing the pinned version of the package in the list. Note that in addition to changing the command run, you can change the environment, which has the same effect, using a build arg as described in this answer.

BMitch
  • 231,797
  • 42
  • 475
  • 450
  • 1
    Thanks, @BMitch. I intentionally cut off the quote because it says "You can _also_ achieve...", so my understanding is that using pinned versions is an _additional_ option but it should be working also without it. However, let's say that the version is pinned: how does docker know that this is not the same command that has already a cache? Does it analyze the command? – Tibor Takács Feb 02 '22 at 12:48
  • 1
    @TiborTakács it's looking at the string being executed, and any inputs (environment variables, and previous stages). If the string changes (or the inputs don't match from before), it's a cache miss and it creates a new layer rather than reusing the old one. – BMitch Feb 02 '22 at 13:47
  • 1
    this matches with my understanding, thank you for the great summary. This means then though that without modifying the command itself, `docker build` will pick up the cached image (if it exists). So, if `package-foo` is installed with version 1.3.**0**, the next `docker build` won't pick up 1.3.**1** because the `RUN` command has not changed. Is this correct? – Tibor Takács Feb 02 '22 at 15:10
  • 3
    @TiborTakács as long as the cache exists, that is correct. If you build without the cache (either by flag or on a machine where it doesn't exist) then your build will result in a different image. – BMitch Feb 02 '22 at 16:10
0

You are right. The documentation is very poorly written. If you read further you can see what the author is trying to say:

The s3cmd argument specifies a version 1.1.*. If the image previously used an older version, specifying the new one causes a cache bust of apt-get update and ensures the installation of the new version.

It seems author thinks 'cache busting' is when you change the Dockerfile in a way that invalidates the cache. But the usual definition of cache busting is a mechanism by which we can invalidate cache even if the file is the same.

Akilan
  • 269
  • 1
  • 2
  • 9