86

I have the below Dockerfile.

FROM ubuntu:14.04
MAINTAINER Samuel Alexander <samuel@alexander.com>

RUN apt-get -y install software-properties-common
RUN apt-get -y update

# Install Java.
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
RUN add-apt-repository -y ppa:webupd8team/java
RUN apt-get -y update
RUN apt-get install -y oracle-java8-installer
RUN rm -rf /var/lib/apt/lists/*
RUN rm -rf /var/cache/oracle-jdk8-installer

# Define working directory.
WORKDIR /work

# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle

# JAVA PATH
ENV PATH /usr/lib/jvm/java-8-oracle/bin:$PATH

# Install maven
RUN apt-get -y update
RUN apt-get -y install maven

# Install Open SSH and git
RUN apt-get -y install openssh-server
RUN apt-get -y install git

# clone Spark
RUN git clone https://github.com/apache/spark.git
WORKDIR /work/spark
RUN mvn -DskipTests clean package

# clone and build zeppelin fork
RUN git clone https://github.com/apache/incubator-zeppelin.git
WORKDIR /work/incubator-zeppelin
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests

# Install Supervisord
RUN apt-get -y install supervisor
RUN mkdir -p var/log/supervisor

# Configure Supervisord
COPY conf/supervisord.conf /etc/supervisor/conf.d/supervisord.conf

# bash
RUN sed -i s#/home/git:/bin/false#/home/git:/bin/bash# /etc/passwd

EXPOSE 8080 8082
CMD ["/usr/bin/supervisord"]

While building image it failed in step 23 i.e.

RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests

Now when I rebuild it starts to build from step 23 as docker is using cache.

But if I want to rebuild the image from step 21 i.e.

RUN git clone https://github.com/apache/incubator-zeppelin.git

How can I do that? Is deleting the cached image is the only option? Is there any additional parameter to do that?

GabLeRoux
  • 16,715
  • 16
  • 63
  • 81
sag
  • 5,333
  • 8
  • 54
  • 91
  • 3
    You can create a Dockerfile that goes up to step 21, tag this image with a name such as step21, and create another Dockerfile that starts with `FROM step21` – user2915097 Feb 02 '16 at 13:01
  • 1
    this is the same idea as https://stackoverflow.com/questions/35134713/disable-cache-for-specific-run-commands/35135412#35135412 – user2915097 Feb 02 '16 at 13:02
  • There is a huge discussion in Github Docker about this specific behaviour (feature request: Selectively disable caching for specific RUN commands in Dockerfile|https://github.com/moby/moby/issues/1996) – Andre Leon Rangel Jun 12 '20 at 02:51

8 Answers8

87

You can rebuild the entire thing without using the cache by doing a

docker build --no-cache -t user/image-name

To force a rerun starting at a specific line, you can pass an arg that is otherwise unused. Docker passes ARG values as environment variables to your RUN command, so changing an ARG is a change to the command which breaks the cache. It's not even necessary to define it yourself on the RUN line.

FROM ubuntu:14.04
MAINTAINER Samuel Alexander <samuel@alexander.com>

RUN apt-get -y install software-properties-common
RUN apt-get -y update

# Install Java.
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
RUN add-apt-repository -y ppa:webupd8team/java
RUN apt-get -y update
RUN apt-get install -y oracle-java8-installer
RUN rm -rf /var/lib/apt/lists/*
RUN rm -rf /var/cache/oracle-jdk8-installer

# Define working directory.
WORKDIR /work

# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle

# JAVA PATH
ENV PATH /usr/lib/jvm/java-8-oracle/bin:$PATH

# Install maven
RUN apt-get -y update
RUN apt-get -y install maven

# Install Open SSH and git
RUN apt-get -y install openssh-server
RUN apt-get -y install git

# clone Spark
RUN git clone https://github.com/apache/spark.git
WORKDIR /work/spark
RUN mvn -DskipTests clean package

# clone and build zeppelin fork, changing INCUBATOR_VER will break the cache here
ARG INCUBATOR_VER=unknown
RUN git clone https://github.com/apache/incubator-zeppelin.git
WORKDIR /work/incubator-zeppelin
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests

# Install Supervisord
RUN apt-get -y install supervisor
RUN mkdir -p var/log/supervisor

# Configure Supervisord
COPY conf/supervisord.conf /etc/supervisor/conf.d/supervisord.conf

# bash
RUN sed -i s#/home/git:/bin/false#/home/git:/bin/bash# /etc/passwd

EXPOSE 8080 8082
CMD ["/usr/bin/supervisord"]

And then just run it with a unique arg:

docker build --build-arg INCUBATOR_VER=20160613.2 -t user/image-name .

To change the argument with every build, you can pass a timestamp as the arg:

docker build --build-arg INCUBATOR_VER=$(date +%Y%m%d-%H%M%S) -t user/image-name .

or:

docker build --build-arg INCUBATOR_VER=$(date +%s) -t user/image-name .

As an aside, I'd recommend the following changes to keep your layers smaller, the more you can merge the cleanup and delete steps on a single RUN command after the download and install, the smaller your final image will be. Otherwise your layers will include all the intermediate steps between the download and cleanup:

FROM ubuntu:14.04
MAINTAINER Samuel Alexander <samuel@alexander.com>

RUN DEBIAN_FRONTEND=noninteractive \
    apt-get -y install software-properties-common && \
    apt-get -y update

# Install Java.
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections && \
    add-apt-repository -y ppa:webupd8team/java && \
    apt-get -y update && \
    DEBIAN_FRONTEND=noninteractive \
    apt-get install -y oracle-java8-installer && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
    rm -rf /var/cache/oracle-jdk8-installer && \

# Define working directory.
WORKDIR /work

# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle

# JAVA PATH
ENV PATH /usr/lib/jvm/java-8-oracle/bin:$PATH

# Install maven
RUN apt-get -y update && \
    DEBIAN_FRONTEND=noninteractive \
    apt-get -y install 
      maven \
      openssh-server \
      git \
      supervisor && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# clone Spark
RUN git clone https://github.com/apache/spark.git
WORKDIR /work/spark
RUN mvn -DskipTests clean package

# clone and build zeppelin fork
ARG INCUBATOR_VER=unknown
RUN git clone https://github.com/apache/incubator-zeppelin.git
WORKDIR /work/incubator-zeppelin
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests

# Configure Supervisord
RUN mkdir -p var/log/supervisor
COPY conf/supervisord.conf /etc/supervisor/conf.d/supervisord.conf

# bash
RUN sed -i s#/home/git:/bin/false#/home/git:/bin/bash# /etc/passwd

EXPOSE 8080 8082
CMD ["/usr/bin/supervisord"]
BMitch
  • 231,797
  • 42
  • 475
  • 450
  • I was kind of using something similar to your approach to build from specific step. Thanks for the side note. That is really helpful – sag Jun 14 '16 at 05:06
  • Hm, for me it won't rebuild if the argument changes. Has this behaviour changed meanwhile? – Konstantin Apr 22 '21 at 11:03
  • @Konstantin I'd suggest opening a new question with the exact steps you are taking, Dockerfile, and include the two different build args. – BMitch Apr 22 '21 at 12:39
  • The build-arg could be the version of a git repository. If the repository has changed, start from cloning the source again. – Leif Neland Oct 26 '21 at 01:45
  • @BMitch Could I use this to optionally rebuild from that layer, i.e. if the supplied `ARG` changes rebuild, otherwise not? – jtlz2 Jun 08 '23 at 08:01
  • Also, weirdly, I had to put ARG one line higher than required. – jtlz2 Jun 08 '23 at 08:12
  • @jtlz2 it sounds like you answered your question. If the above doesn't work in your scenario, post a new question with a [mcve]. – BMitch Jun 08 '23 at 11:37
53

One workaround:

  1. Locate the step you want to execute from.
  2. Before that step put a simple dummy operation like "RUN pwd"

Then just build your Dockerfile. It will take everything up to that step from the cache and then execute the lines after the dummy command.

user6461348
  • 555
  • 4
  • 3
  • This is really helpful. Where is it documented (or in other words, why is it happening?) – Dror Aug 01 '17 at 05:59
  • 3
    I realized it is not what I needed. It helps only upon the very first build after the change... But it doesn't help when running further builds... – Dror Aug 01 '17 at 07:28
  • 1
    This doesn't work for me. Docker version 18.03.1-ce, build 9ee9f40 – Randy May 25 '18 at 18:16
  • 1
    @Dror it happens as Docker can see that, that line has changed and executes from thereon. That's why it only works the first time. – Nicolai Anton Lynnerup May 15 '19 at 08:39
  • 2
    this is only good for the "quick and dirty" stage of writing a Dockerfile as it is manual and only works once, eventually @toms130 answer with an ever-changing dummy arg is much better – cryanbhu Jan 13 '21 at 02:40
  • That doesn't help when Docker/Podman starts building at a much more recent step *(for whatever reason)*, and you want to force it not do it. – Hi-Angel Mar 31 '21 at 14:06
16

To complete Dmitry's answer, you can use uniq arg like date +%s to keep always same commandline

docker build --build-arg DUMMY=`date +%s` -t me/myapp:1.0.0

Dockerfile:

...
ARG DUMMY=unknown
RUN DUMMY=${DUMMY} git clone xxx
...
jtlz2
  • 7,700
  • 9
  • 64
  • 114
toms130
  • 161
  • 1
  • 4
14

Update 2023

Docker now includes --cache-from and --cache-to arguments. The post from @reddot below is now the best answer.


Previous answer

A simpler technique.

Dockerfile:
Add this line where you want the caching to start being skipped.

COPY marker /dev/null

Then build using

date > marker && docker build .

Bernard
  • 16,149
  • 12
  • 63
  • 66
  • 1
    so you say, whenever the contents of .e.g "marker" ist changed, the `COPY marker /dev/null` is re-done? – Luke Oct 07 '18 at 21:24
  • "marker" is always changed since it contains the current timestamp, so COPY marker /dev/null is always being executed as well as all instructions that follow it in the Dockerfile. – Bernard Oct 08 '18 at 01:13
  • @Luke Yes; changing the contents of the file `marker` will force a rebuild. After that it will not rebuild again until you again change the contents of `marker` (or some other line in the Dockerfile is considered changed). ¶ Note that you do not need to (and should not) run that `date > marker` command every time; you just run it when you want to force a rebuild instead of using the cached image. – cjs Sep 17 '22 at 04:48
9

Another option is to delete the cached intermediate image you want to rebuild.

Find the hash of the intermediate image you wish to rebuild in your build output:

Step 27/42 : RUN lolarun.sh
 ---> Using cache
 ---> 328dfe03e436

Then delete that image:

$ docker image rmi 328dfe03e436

Or if it gives you an error and you're okay with forcing it:

$ docker image rmi -f 328dfe03e436

Finally, rerun your build command, and it will need to restart from that point because it's not in the cache.

Geoff Gustafson
  • 379
  • 3
  • 4
  • It appears that this is not allowed by later versions of docker. "Error response from daemon: conflict: unable to delete 328dfe03e436 (cannot be forced) - image has dependent child images" – Charlie Haley Feb 26 '20 at 17:20
  • 2
    @CharlieHaley In this case, you can consider deleting the child image -- which is simply the next intermediate image built after this step. – Alex Fortin Aug 24 '20 at 00:19
4

If place ARG INCUBATOR_VER=unknown at top, then cache will not be used in case of change of INCUBATOR_VER from command line (just tested the build). For me worked:

# The rebuild starts from here
ARG INCUBATOR_VER=unknown
RUN INCUBATOR_VER=${INCUBATOR_VER} git clone https://github.com/apache/incubator-zeppelin.git
GabLeRoux
  • 16,715
  • 16
  • 63
  • 81
Dmitry
  • 846
  • 1
  • 7
  • 20
4

Use --cache-from=... option and specify hash of the last layer to reuse without rebuild. All subsequent layers will be re-built.

Say I have following cached docker build:

$ docker build -t pinger:latest .
Sending build context to Docker daemon  6.924MB
Step 1/5 : FROM ubuntu:latest
 ---> 58db3edaf2be
Step 2/5 : RUN echo "$(date)"
 ---> Using cache
 ---> b62b5deffedf
Step 3/5 : RUN apt-get update -y && apt-get install -y iputils-ping
 ---> Using cache
 ---> 02ba4da7d7a6
Step 4/5 : ENTRYPOINT ["ping"]
 ---> Using cache
 ---> dfd4c593d7be
Step 5/5 : CMD ["127.0.0.1"]
 ---> Using cache
 ---> 716cc6cbcf0e
Successfully built 716cc6cbcf0e
Successfully tagged pinger:latest

Now if I want to force apt-get stanza to re-run:

$ docker build --cache-from=b62b5deffedf -t pinger:latest .
Sending build context to Docker daemon  6.924MB
Step 1/5 : FROM ubuntu:latest
 ---> 58db3edaf2be
Step 2/5 : RUN echo "$(date)"
 ---> Using cache
 ---> b62b5deffedf
Step 3/5 : RUN apt-get update -y && apt-get install -y iputils-ping
 ---> Running in 0d96737075a6
...
reddot
  • 764
  • 7
  • 15
-1

As there is no official way to do this, the most simple way is to temporarily change the specific RUN command to include something harmless like echo:

RUN echo && apt-get -qq update && \
    apt-get install nano

After building remove it.

chrizzler
  • 82
  • 2
  • 9