2

I want to do a docker multi-stage build but rm/ignore the .git folder, to save space on the docker image.

FROM ubuntu as first
WORKDIR /app
RUN git clone <repo>

FROM golang as second
WORKDIR app
COPY --from=first /app .

is there is some --exclude option for COPY? Here is a related issue: https://forums.docker.com/t/dockerignore-in-multi-stage-builds/57169

another possibility is to remove the .git folder manually:

FROM ubuntu as first
WORKDIR /app
RUN git clone <repo>
RUN rm -rf .git

I assume the multi-stage build copies the "final layer" from the other stage?

  • 2
    You might find it easier to run `git clone` on the host, before you run `docker build`: you don't have to disable Docker layer caching to get an updated repository, you can easily build non-current commits or branches, and you don't have to try to get credentials into Docker space to clone private repositories. That then avoids this issue, since you can include `.git` in `.dockerignore`. – David Maze Jan 30 '20 at 23:25

2 Answers2

2

One of the ways to exclude files from the build is to use a .dockerignore file. However, this is probably not what you need as you're running a git clone during the image preparation, so you will actually need the .git folder.

If you'd like to use a multistage build then what you will need to copy are the artifacts, not the layers, of the previous build to the next one.

Another idea is to run a shallow clone - git clone --depth=1 - this should significantly reduce the size of the repository.

andrzejwp
  • 922
  • 4
  • 11
  • I think `rm -rf` at the end of the first stage will work too? would like to do if that's true or not –  Feb 01 '20 at 00:02
  • Depends on where you build your final Dockerfile. In general if you do `RUN git clone` and then `RUN rm -rf .git` - the first RUN command will create a separate layer, so to save space you would put it in one command `RUN git clone && rm -rf .git`. Which kind of defeats the purpose ;) Read more [here](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run) – andrzejwp Feb 03 '20 at 10:20
  • AFAIU .dockerignore is only used local directory content and has no impact on COPY --from. – Yannick Sep 02 '21 at 11:34
0

I realized this technique that I was going for in the OP simply won't work. Most people are going to need the .git folder to checkout the right commit. The whole point of cloning the whole repo was caching that so that we can checkout the desired commit later on the next time we do a build.

So instead of doing what I was trying to do in the OP, one technique I did use in the past to get good caching and to produce small images, was something like this:

WORKDIR /app
ADD  's3://url/to/just/package.json' /app/package.json
RUN npm install --production

ARG commit_id
RUN aws s3 cp -c . 's3://url/to/whole/tarball'

so you can cache the dependencies if package.json hasn't changed, and when you do a build you push a lean tarball to s3 with a TTL, and then the build system can pull the tarball to the image. The tarball doesn't have a git folder and can exclude a bunch of other files as desired which are normally tracked by version control.