Does anyone build "source" docker images?

Question

Back in the day when we had fewer package choices I spent a lot of time building RPM packages. It has the concept of a "source" package and a "binary" package. A lot of our current docker images all start out pretty similarly, they pull whatever source from github and build it. Unless we keep the source around in the image that gets built, there's no guarantee that in the future that source from that github repository will still be available if we need to update the docker image, or for that matter look at the source code that went into the image. For some of the programs we use, this is a real risk of the source going away.

In most cases, the source really doesn't take up that much space so if we really wanted to we could leave it in a /src directory. But with multi-stage builds, I wondered if it might make sense to have a "src" stage which simply pulls out the source and unpacks/untar it. You could then save this target as a "-src" image just in case you ever needed it again. It would have the added benefit that when debugging the build stage, you won't have to repeatedly keep going back to get the source code, since most of our Dockerfiles leverage '&&' pretty heavily in a large RUN statement.

This question may get closed as 'purely opinionated' but let me toss mty $0.02 in: I maintain both Debian packages (in the distro) and a few Docker containers that are used, and I think basing the container on _binaries_ has many advantages in terms of reproducibility and general "composition". So if I were in your shoes I would use, say, Copr, or OBS to turn the source repo into RPMs first and then use those in the container. I had good luck with both local Debian package (in a repo on GitHub for ease of use) and Launchpad for Ubuntu. But YMMV... — Dirk Eddelbuettel, Aug 10 '21 at 22:59

score 1 · Answer 1 · answered Aug 10 '21 at 23:24

With a standard application, there's no reason to produce a "source Docker image".

Mechanically, a Docker image is intended to be a "closed" runnable artifact; whatever's in an image, you docker run it the same way, without being able to directly access its contents. An RPM package gets unpacked into the host filesystem, though, and you can directly access its individual files. It doesn't quite make sense to produce a Docker image that you can't run, but it could make sense to produce an RPM package that happens to just install files into /usr/src.

Historically, the RPM system is quite old. Today I'd suggest just putting everything up on GitHub and you'd be able to read it there, but when Red Hat first started, GitHub didn't exist (nor did distributed source control nor most of the modern Internet). So Red Hat had to invent some way to redistribute source for reproducibility and to satisfy the GPL's requirements, and they chose source RPMs. (Debian had the same problem and chose a simpler tar-file format.)

You're probably keeping the rest of your application code in some sort of source-control system (since Red Hat was young, Subversion was created as a "better CVS", and then faded as Git was invented and then grew in popularity; you're no longer stuck with RCS). Check the Dockerfile in there too. Don't try to check out code inside the Dockerfile; instead, check out your application source tree and run docker build on what's been committed already.

... they pull whatever source from github and build it.

One feature of Git is that you can create a local copy of a repository. If you're worried about the upstream GitHub copy of some package changing, you can keep your own local copy of it. Periodically pull from GitHub into an upstream branch and then merge the upstream branch into your main deployment branch.

In this model I'd still endorse keeping your Dockerfiles in the same repository as the application, but there are other approaches to managing this (have your CI system check out both the Docker-configuration repository and the application repository; use complex source-control features like Git subtrees).

Appreciate your feedback. Not all of the programs we use happen to keep their source in github. I know...I know...that's hard to believe. Sourceforge and the old defunt Google Code are some examples, or from their web site... The original authors have written their code, published their paper, and moved on and couldn't be bothered to maintain it and move it over to github. So yes, we could import it into our own github repos and build from there. If you absolutely need to have full provenance of the code, is everyone just simply forking everything they use? — Joe Slagel, Aug 11 '21 at 02:56

Does anyone build "source" docker images?

1 Answers1