How to debug Docker cache invalidation?

Question

Docker has a cache, which is great, but all I see in the "docker build" output is either:

---> Using cache

or the output of the command (which implies it's not using the cache).

After one step in my Dockerfile (a COPY), it clearly doesn't use the cache. But I'm fairly certain nothing has changed in the folder that it's copying. (It's our application, and I run into the no-cache case even when I deploy twice in a row, for example.)

Is there any way to get Docker to tell me what it thinks changed?

I know Docker used to check timestamps for this, but that was fixed in Docker 1.8, and I'm on Docker 1.9.x here.

@JoelESalas: I don't understand your request. Something as simple as `FROM ubuntu:14.04` `MAINTAINER me` `COPY /app/ /app/` will demonstrate this. And I'm not going to post my entire source code and infrastructure. — Timmay, Jan 11 '16 at 21:36
Besides, even if looking at the Dockerfile could help, the question was how do *I* diagnose such problems. I don't want *somebody else* to look at my config and tell me the answer. I want to know what tools exist to help solve the problem. — Timmay, Jan 11 '16 at 21:38
How sure are you that nothing is changing in that directory? — Joel E Salas, Jan 12 '16 at 00:11

score 6 · Answer 1 · answered Jan 12 '16 at 18:51

6

Use binary search, with .dockerignore.

Add half your files to .dockerignore, and build the container. If it uses the cache for the COPY step, then you know the changed files are in the set you ignored, otherwise you know it's in the other half. Repeat this test with the set of files that has the change, until it's just one file/folder.

(Dear lazyweb: figure out some way to extend Docker to make this less painful!)

answered Jan 12 '16 at 18:51

Timmay

271
2
5

3

Sounds an okay approach for debugging something locally, thanks. In my case, I'm currently trying to debug the Docker cache in a CI environment, and I'm a bit bummed that I can't seem to find any way to make the Docker build more verbose =/ – elias May 03 '19 at 09:02
In my case the cache is only invalidated some times, so this approach wouldn't work – Shanteva May 12 '21 at 18:51

score 0 · Answer 2 · answered Jan 08 '23 at 10:25

As I've discussed in this blog post, I found the following reasons for (unexpected) cache misses in Docker:

If the problem happens in a CI pipeline, and if you use multiple build agent machines (or "ephemeral" machines), it might happen that build job #1 was executed on agent #A, but build job #2 was executed on agent #B, which has a different local cache than agent #A. Consequently, when you look at your CI pipeline output, be sure to check on which agent the jobs are executed. To prevent these kinds of cache misses, you can use a remote cache, storing caching information in a remote image registry. There are two implementation approaches: inline caching (where the image builder embeds caching meta-data into the image it builds), or using a separate registry cache (where a separate image is pushed that contains only cache blobs). The usage details of remote caching depend on your image builder tool. For instance, docker build supports only inline caching (see here), docker buildx (or when using BuildKit directly) supports both approaches (see here), Buildah and kaniko only support the registry cache.
If you use ARG in your Dockerfile, it is easy to accidentally break the cache invalidation. Whenever the value of some ARG is different between two docker build executions, the second execution won’t be able to reuse the previously cached layer for a RUN or ENV command that uses the ARG's value. This then also invalidates all follow-up layers. See here for background information. If you use multi-stage builds, and if you run docker build several times (for different targets), make sure you always provide the same ARG values to all docker build calls!
Sometimes the entire image is rebuilt whenever a new base image has been released (that you reference in a FROM statement). This particularly happens if you use docker build --pull. You need to closely look at the builder’s output of the first layer, which includes the SHA-256 checksum of the base image. If it keeps changing frequently, there is no real "fix". Your image should be rebuilt, to include the most recent security fixes of the base image. However, if the base image is rebuilt very often (e.g. multiple times per day), you may want to stop using the --pull flag, and instead have a different approach that only runs docker pull <base image> (or delete the base image) more rarely, e.g. once per day.
Layers for COPY or ADD statements are rebuilt "unexpectedly" whenever files change that you did not have on your radar (which you did not include into your .dockerignore yet). This could be the .git folder, or files created during building/testing (e.g. unit test report files, or log files). This typically happens when running COPY . ., because then your entire project directory is copied from the build context into the build container, which increases the chance that you missed excluding some superfluous files (that do not belong into the container anyway) via .dockerignore. To make it easy to address this issue, I developed a small CLI tool called Directory Checksum. It recursively computes the checksum of the contents of a directory, and prints the checksums up to a depth you can specify.

How to debug Docker cache invalidation?

2 Answers2