25

Maybe my Google Foo is not strong enough, but I can't find a definite list about when Docker images in the cache are invalidated. Specifically, I'm interested at least in these scenarios:

  • Invalidation because of mtime changes vs checksum changes. Which applies when? Can it deal with different source paths (e.g. clones of the repository in different directories)?
  • Invalidation because of updated base images. At which point will (security) updates of e.g. Debian bubble down to me?
  • Are there any explicit APIs that a continuous integration tool can use to tell Docker which cached images can be reused and which can't (for example because of a wget foo.com/latest.gz)?
Community
  • 1
  • 1
Perseids
  • 12,584
  • 5
  • 40
  • 64

2 Answers2

27

As of Docker 1.8, Docker no longer uses mtime to invalidate the cache (this changed in this pull request #12031).

When building an image;

  • For local content (ADD myfiles /somewhere / COPY myfiles /somewhere), docker uses checksum changes to invalidate the cache
  • Remote content (ADD http://example.com/foobar /somewhere), is always downloaded, but the build-cache is invalidated based on checksum changes
  • RUN instructions (such as wget foo.com/latest.gz) will never invalidate the cache, unless the instruction is changed; i.e., the cache is based on the text in the instruction. If you want reproducible builds, make sure these URLs point to a specific version (wget http://example.com/package-major.minor.patch.gz)

Docker 1.9 introduced support for build-time arguments, which enable you to pass variables that can be used inside the Dockerfile so that you don't have to edit the Dockerfile to break the cache, or install a different version of the package.

For example

FROM foobar
ARG MAJOR=1
ARG MINOR=0
ARG PATCH=0
ADD http://example.com/package-$MAJOR.$MINOR.$PATCH.gz /

Will add http://example.com/package-1.0.0.gz by default, however, passing a "major", "minor" or "patch" build-time parameter can override the version to download, and will invalidate the cache;

docker build --build-arg MINOR=2 .                                           Sat Jan 16 13:22:40 2016
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu
 ---> 1c9b046c2850
Step 2 : ARG MAJOR=1
 ---> Using cache
 ---> a149d88772ba
Step 3 : ARG MINOR=0
 ---> Using cache
 ---> e3dae0189ffd
Step 4 : ARG PATCH=0
 ---> Using cache
 ---> 678d7ae33054
Step 5 : ADD http://example.com/package-$MAJOR.$MINOR.$PATCH.gz /
Get http://example.com/package-1.2.0.gz: dial tcp 127.0.0.1:80: getsockopt: connection refused

For more information about the build-cache, read the build-cache section in the documentation.

At which point will (security) updates of e.g. Debian bubble down to me?

Docker will not automatically download updated images, or update your images that are based on them. However, if you docker pull yourbaseimage, and a newer image is downloaded, then the build-cache for images based on that is invalidated, so the next build will not use the cache.

For automated builds on Docker hub, you can make sure that images are automatic rebuilt if the base-image is updated, see the documentation on automated builds

thaJeztah
  • 27,738
  • 9
  • 73
  • 92
1

TL;DR

About the @thaJeztah's answer, I've tried it. And the --no-cache does not only force the current stage to rebuild, but also forces any dependant stage to be completely rebuilt. And this is not what we want.

But there's a way to force an invalidation only to specific stages: Use a named ARG and do not use it in the Dockerfile. And pass it as a --build-arg to docker build.

This creates a "different layer" and therefore invalidates anything behind.

Rationale

This is an excerpt of my final dockerfile with 4 stages:

  • repo-sources-base => The base to download my libraries - don't rebuild
  • repo-sources => My libraries - rebuild
  • base => Ubuntu with my apache, php, etc. - don't rebuild
  • production => The base with my project and my libraries that goes to the production server - rebuild

see here:

#===========================================================================#
# Stage `repo-sources-base`                                                 #
# -------------------------                                                 #
# This image installs git so we don't have to install git it each time we   #
# rebuild the repo-sources.                                                 #
#===========================================================================#    
FROM ubuntu:20.04 AS repo-sources-base

# Install git.
RUN \
    apt-get update && \
    apt-get install -y git && \
    :

# Scan the gitlab host key.
RUN \
    touch /root/.ssh/known_hosts && \
    ssh-keyscan gitlab.com >> /root/.ssh/known_hosts && \
    :


#===========================================================================#
# Stage `repo-sources`                                                      #
# --------------------                                                      #
# This image contains the SSH private keys to make the clone of the         #
# source code from gitlab. This key is passed via ARG to avoid hardcoding   #
# it here inside.                                                           #
#===========================================================================#
FROM repo-sources-base AS repo-sources

# NOTE THIS ARG!!!!!!!
# Not used anywhere... just declared.
# But invalidates the docker build cache on purpose!
# See the answer text for explanation.
ARG INVALIDATE_CACHE_TIMESTAMP="0000-00-00T00:00:00.000000Z"

# Inspired here https://vsupalov.com/build-docker-image-clone-private-repo-ssh-key/    
# Add credentials.
ARG SSH_PRIVATE_KEY
RUN \
    mkdir /root/.ssh && \
    echo "${SSH_PRIVATE_KEY}" > /root/.ssh/id_rsa && \
    chmod 600 /root/.ssh/id_rsa && \
    :

# Clone the needed repos.
RUN mkdir -p whatever-path/repos

WORKDIR /whatever-path/repos

RUN git clone --quiet git@gitlab.com:my-nice-account/my-nice-project-1.git
RUN git clone --quiet git@gitlab.com:my-nice-account/my-nice-project-2.git maybe_deploy_dir_2
RUN git clone --quiet git@gitlab.com:my-nice-account/my-nice-project-3.git
RUN git clone --quiet git@gitlab.com:my-nice-account/my-nice-project-4.git


#===========================================================================#
# Stage `base`                                                              #
# ------------                                                              #
# This image contains the base operating system to build the production     #
# release on top of it. It is expected to mutate very slowly so we can have #
# the layers pre-cached when building.                                      #
# TODO: Separate this in 2 bases: one for production and the other for      #
#     development or testing, like in here:                                 #
#     https://www.docker.com/blog/advanced-dockerfiles-faster-builds-and-smaller-images-using-buildkit-and-multistage-builds/ #
#===========================================================================#
FROM ubuntu:20.04 AS base

# Install apache, php, yarn or whatever "base production server"
# BUT NOT your source code, just "the base"


#===========================================================================#
# Stage `release`                                                           #
# ---------------                                                           #
# This image will be the one released to pre or prod and configured at      #
# runtime via env-vars like backing-services, databases, etc.               #
#===========================================================================#
FROM base AS release

# Copy the project files
COPY . /whatever-maybe-other-path/repos/app

# Copy the dependency files
COPY --from=repo-sources /whatever-path/repos /whatever-maybe-other-path/repos

# Continue with the "fine-tuning" after copying the source code, like static building, etc.

The 4 targets here are grouped in blocks of 2:

  • repo-sources-base and repo-sources are meant for the local downloads having SSH keys that never get either deployed not locked into an intermediate layer that goes to production.
  • base and production build the image to go to production.

The problem is what @Perseids said: Running the docker build just ignores the cloning of our repos as it's cached, and passing --no-cache rebuilds too much.

With this 4-target structure we build "once" repo-sources-base and base and we only want to rebuild repo-sources and production.

The key here is the ARG named INVALIDATE_CACHE_TIMESTAMP.

Here's the behaviour:

  • If you rebuild --target repo-sources many times, it's loading from cache and we don't want that.
  • If you rebuild --target repo-sources --no-cache it's forcing to also rebuilding the repo-sources-base and we don't want that.
  • If you rebuild --target repo-sources --build-arg X=Y it's forcing to rebuild only if X is referenced in the Dockerfile and it has a "new value" even if it's not used later. This is why I "name" INVALIDATE_CACHE_TIMESTAMP in the ARG line.

So doing

docker build --target repo-sources --build-arg INVALIDATE_CACHE_TIMESTAMP=first [...]

Will build it.

Doing "again" this

docker build --target repo-sources --build-arg INVALIDATE_CACHE_TIMESTAMP=first [...]

uses the cache.

Changing the value like this

docker build --target repo-sources --build-arg INVALIDATE_CACHE_TIMESTAMP=second [...]

forces a rebuild

Now using first or second would use the cache but a new value would force a rebuild.

So what I do in my build script:

NOW=$(date --utc --iso-8601='ns' | sed 's/,/./' | cut -c 1-26 | sed 's/$/Z/')
docker build --target repo-sources --build-arg INVALIDATE_CACHE_TIMESTAMP=${NOW} [...]
docker build --target production [...]

So it "forces" the repo-sources to be built again and therefore the target productionwill take base from cache and repo-sources from the latest cached build that I enforced to rebuild.

Xavi Montero
  • 9,239
  • 7
  • 57
  • 79