2

My target container is a build environment container, so my team would build an app in a uniform environment.
This app doesn't necessarily run as a container - it runs on physical machine. The container is solely for building.

The app depends on third parties.
Some I can apt-get install with Dockerfile RUN command.
And some I must build myself because they require special building.

I was wondering which way is better.

  1. Using multistage build seems cool; Dockerfile for example:
From ubuntu:18.04 as third_party
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        ...
ADD http://.../boost.tar.gz /
RUN tar boost.tar.gz && \
        ... && \
        make --prefix /boost_out ...

From ubuntu:18.04 as final
COPY --from=third_party /boost_out/ /usr/
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        ...
CMD ["bash"]
...

Pros:

  • Automatically built when I build my final container
  • Easy to change third party version (boost in this example)

Cons

  • ADD command downloads ~100MB file each time, makes image build process slower
  • I want to use --cache-from so I would be able to cache third_party and build from different docker host machine. Meaning I need to store ~1.6GB image in a docker registry. That's pretty heavy to pull/push.

On the other hand

  1. I could just build boost (with this third_party image) and store its artifacts on some storage, git for example. It would take ~200MB which is better than storing 1.6GB image.

Pros:

  • Smaller disc space

Cons:

  • Cumbersome build
    • Manually build and push artifacts to git when changing boost version.
    • Somehow link Docker build and git to pull newest artifacts and COPY to the final image.

In both ways I need a third_party image that uniformly and automatically builds third parties. In 1. the image bigger than 2. that will contain just build tools, and not build artifacts.

Is this the trade-off?
1. is more automatic but consumes more disk space and push/pull time,
2. is cumbersome but consumes less disk space and push/pull time?

Are there any other virtues for any of these ways?

hudac
  • 2,584
  • 6
  • 34
  • 57

1 Answers1

1

I'd like to propose changing your first attempt to something like this:

FROM ubuntu:18.04 as third_party
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        ...
RUN wget http://.../boost.tar.gz -O /boost.tar.gz && \
    tar xvf boost.tar.gz && \
        ... && \
        make --prefix /boost_out ... && \
        find -name \*.o -delete && \
        rm /boost.tar.gz  # this is important!

From ubuntu:18.04 as final
COPY --from=third_party /boost_out/ /usr/
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        ...
CMD ["bash"]

This way, you are paying for the download of boost only once (when building the image without a cache), and you do not pay for the storage/pull-time of the original tar-ed sources. Additionally, you should remove unneeded target files (.o?) from the build in the same step in which they are generated. Otherwise, they are stored and pulled as well.

If you are at liberty posting the whole Dockerfile, I'll gladly take a deeper look at it and give you some hints.

thriqon
  • 2,458
  • 17
  • 23
  • Are you actually saying `ADD` should never be used? Always use `RUN wget ...` instead? I could delete the whole extracted dir, including `*.o`, that's a good idea! But I'm not sure about deleting `rm /boost.tar.gz` - wouldn't it be nice to keep that in case the link breaks, or if I want to fetch the `third_party` image and do some manual building? – hudac Sep 06 '19 at 06:22
  • 1
    Yes, you should never use `ADD` with an URL, even Docker says so: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/ – thriqon Sep 06 '19 at 14:57
  • My personal reason: It does not use the build cache (even with seemingly stable URLs): https://github.com/moby/moby/issues/12361 To be fair, it's not a good idea to assume the file behind a URL never changes, so the design decision is reasonable. – thriqon Sep 06 '19 at 15:05