85

I'm a little bit confused about these two options. They appear to be related. However, they're not really compatible.

For example, it seems that using Dockerfiles means that you shouldn't really be committing to images, because you should really just track the Dockerfile in git and make changes to that. Then there's no ambiguity about what is authoritative.

However, image commits seem really nice. It's so great that you could just modify a container directly and tag the changes to create another image. I understand that you can even get something like a filesystem diff from an image commit history. Awesome. But then you shouldn't use Dockerfiles. Otherwise, if you made an image commit, you'd have to go back to your Dockerfile and make some change which represents what you did.

So I'm torn. I love the idea of image commits: that you don't have to represent your image state in a Dockerfile -- you can just track it directly. But I'm uneasy about giving up the idea of some kind of manifest file which gives you a quick overview of what's in an image. It's also disconcerting to see two features in the same software package which seem to be incompatible.

Does anyone have any thoughts on this? Is it considered bad practice to use image commits? Or should I just let go of my attachment to manifest files from my Puppet days? What should I do?

Update:

To all those who think this is an opinion-based question, I'm not so sure. There are some subjective qualities to it, but I think it's mostly an objective question. Furthermore, I believe a good discussion on this topic will be informative.

In the end, I hope that anyone reading this post will come away with a better understanding of how Dockerfiles and image commits relate to each other.

Update - 2017/7/18:

I just recently discovered a legitimate use for image commits. We just set up a CI pipeline at our company and, during one stage of the pipeline, our app tests are run inside of a container. We need to retrieve the coverage results from the exited container after the test runner process has generated them (in the container's file system) and the container has stopped running. We use image commits to do this by committing the stopped container to create a new image and then running commands which display and dump the coverage file to stdout. So it's handy to have this. Apart from this very specific case, we use Dockerfiles to define our environments.

David Sanders
  • 4,069
  • 1
  • 24
  • 38
  • 2
    There's a lot of philosophy here, and peoples' philosophies differ, making this maybe a bad question for SO. My own philosophy, though -- you always want to know exactly how to reproduce a given image, and be sure you _can_ do that. Using golden images, it's very very easy not to know how someone got from state A to state B, whereas with automation that drives the build process, there's no way to forget. So, personally, yes, I call Dockerfiles the Right Thing, and image commits (like *any* kind of golden image that relies on manual intervention to build) evil. But that's me, and YMMV. – Charles Duffy Sep 30 '14 at 00:42
  • @CharlesDuffy Where should this question go if it doesn't belong on SO? – David Sanders Sep 30 '14 at 02:09
  • 1
    @CharlesDuffy By the way, I don't agree that this is an opinion-based question. There should be a right answer here, if not a right answer which depends on a little more context. FYI, I did vote to move my question to Serverfault. – David Sanders Sep 30 '14 at 02:20
  • 2
    ...*shrug*. I can cite concrete reasons for my preference, but does that make them more right than the concrete reasons someone else cites for theirs? I've worked with people who are fans of the golden-image approach, and it's not always possible to win someone over by citing facts, because parties can value different characteristics in a solution to different extents, so it's entirely possible to agree on facts but disagree on conclusions. Which is the hallmark of an opinion-based question, and what I expect to see if we actually get any golden image fans showing up here. – Charles Duffy Sep 30 '14 at 02:26
  • @CharlesDuffy Alright, I suppose that makes sense. It sounds like this is an ongoing debate in the Docker community. – David Sanders Sep 30 '14 at 02:40
  • That's exactly my question. I first thought (or hoped) that git commit would append changes to the dockerfile – Nicolas Zozol Nov 27 '15 at 19:39
  • I had this very same question and was also torn until reading this page. Since then, I've straddled both worlds by getting something working and then picking through /root/.bash_history to make a docker file. There is also a tool https://github.com/citostyle/docker-record for this purpose, which I'm not sure why isn't already actually a part of docker. So as a result, I've submitted a docker feature request here: https://forums.docker.com/t/record-container-commands-to-produce-a-dockerfile/26187 – QA Collective Dec 20 '16 at 02:15
  • 1
    I've been wondering the same thing, and my impression (which could be totally wrong) it that it's really the same case as with vms --> you don't want to not know how to recreate the vm image. In my case I have regular .sh scripts to install, and am wondering why I can't just maintain these, run docker and effectively call these, and create the golden version image that way. My scripts work to get it installed on a local pc, and the reason I want to use docker is to deal with conflicts of multiple instances of programs/have clean file system/etc – Bradford Medeiros May 24 '17 at 23:39

2 Answers2

50

Dockerfiles are a tool that is used to create images.

The result of running docker build . is an image with a commit so it's not possible to use a Dockerfile with out creating a commit. The question is should you update the image by hand each time anything changes and thus doom yourself to the curse of the golden image?

The curse of the golden image is a terrible curse cast upon people who must continue living with a buggy security hole ridden base image to run their software on because the person who created it was long ago devoured by the ancient ones (or moved on to a new job) and nobody knows where they got the version of imagemagic that went into that image. and is the only thing that will link against the c++ module that was provided by that consultant the boss's son hired three years ago, and anyway it doesn't matter because even if you figured out where imagemagic came from the version of libstdc++ used by the JNI calls in the support tool that intern with the long hair created only exists in an unsupported version of ubuntu anyway.

Arash Milani
  • 6,149
  • 2
  • 41
  • 47
Arthur Ulfeldt
  • 90,827
  • 27
  • 201
  • 284
  • 3
    This actually seems like the heart of the issue to me. If you use image commits exclusively, you're stuck with your base image. You could do a dist-upgrade on the image but that just sounds like a nightmare. It seems way more preferable to have that kind of information presented cleanly in a Dockerfile. I think Docker shouldn't have exposed image commits as part of its public API. There just doesn't seem to be any legitimate use for them other than for purposes of illustration in the introductory documentation. – David Sanders Oct 23 '14 at 19:09
  • 10
    I didn't just make that example up ... :-( – Arthur Ulfeldt Nov 22 '16 at 16:33
24

Knowing both solutions advantages and inconvenient is a good start. Because a mix of the two is probably a valid way to go.

Con: avoid the golden image dead end:

Using only commits is bad if you lose track of how to rebuild your image. You don't want to be in the state that you can't rebuild the image. This final state is here called the golden image as the image will be your only reference, starting point and ending point at each stage. If you loose it, you'll be in a lot of trouble since you can't rebuild it. The fatal dead end is that one day you'll need to rebuild a new one (because all system lib are obsolete for instance), and you'll have no idea what to install... ending in big loss of time.

As a side note, it's probable that using commits over commits would be nicer if the history log would be easily usable (consult diffs, and repeat them on other images) as it is in git: you'll notice that git don't have this dilemma.

Pro: slick upgrades to distribute

In the other hand, layering commits has some considerable advantage in term of distributed upgrades and thus in bandwidth and deploy time. If you start to handle docker images as a baker is handling pancakes (which is precisely what docker permits), or want to deploy tests version instantly, you'll be happier to send just a small update in the form of a small commit rather a whole new image. Especially when having continuous integration for your customers where bug fixes should be deployed soon and often.

Try to get the best of two worlds:

In these type of scenario, you'll probably want to tag major version of your images and they should come from Dockerfiles. And you can provide continuous integration versions thanks to commits based on the tagged version. This mitigates advantages and inconvenients of Dockerfiles and layering commits scenario. Here, the key point is that you never stop keeping track of your images by limiting the number of commits you'll allow to do on them.

So I guess it depends on your scenario, and you probably shouldn't try to find a single rule. However, there might be some real dead-ends you should really avoid (as ending up with a "golden image" scenario) whatever the solution.

vaab
  • 9,685
  • 7
  • 55
  • 60
  • Doesn't Docker intelligently rebuild images such that you don't need to pull down a whole new image to deploy after you've rebuilt from a modified Dockerfile? – David Sanders Sep 30 '14 at 15:37
  • 1
    @DavidSanders You won't end by downloading whole new images, you are right about this. But any early instruction which outcomes changes will invalidate the whole following layers. Notoriously, 'ADD' or any dependency instruction are often instructions that you could do at the end with a single new commit, but if you rebuild your Dockerfile, it'll generate whole new layers. Some efforts have been made around 'ADD' to be more clever https://github.com/docker/docker/issues/880, see also, following similar concerns http://jpetazzo.github.io/2013/12/01/docker-python-pip-requirements/ – vaab Oct 01 '14 at 04:43