Properly Versioning Docker Images

Question

Following on the 4-year-old question Docker image versioning and lifecycle management, because IMHO it did not address versioning Docker Images properly:

I don't find this answer to be adequate, as there can be successive versions of the same tag. We need a way to be able to lock down dependencies onto a particular version of a tag.

and also,

the answer is to not use latest.

The "solution" I found on the web is confusing too. E.g.,

Here it hinted not use latest, and the "solution" hinted to be tagging twice. I emphasize on "hinted" because there is no solid recommendation (to me).
And here it even shows that we need to do docker push twice on the same image.

So, how to properly versioning Docker Images (both locally and when pushing/publishing to docker hub)?

AMEND:

So far there are two answers. Thanks for that.

Both use git's short version ID.
And both miss the pushing/publishing part from the answer.

As I do need to push/publish my docker image to Docker repository, and from here it hinted that not using latest will give you trouble when pulling the latest, if you go with specific ID tagging. Moreover, using git's short version ID might be a good solution for internal use, but when publishing docker image for public consumption, it then might not be the best solution.

score 40 · Accepted Answer · answered May 20 '19 at 00:08

Docker gives no semantic meaning at all to tag values. A tag can be any string value at all, and tags can be reused. The only special tag value is that if you just say imagename in a docker pull or docker run command, it is automatically interpreted as imagename:latest.

Mechanically, you can give the same image multiple tags, but you need to docker push all of them. The expensive part of the push is the layer content and so this will mostly just push the fact of the alternate tag on an existing image. Similarly, pulling an image tag, if it's a duplicate of an image you already have, is all but free, but there's no easy way to find out all of the tags for a given image.

I would recommend:

Give every build a unique identifier, something like a source control commit ID or a timestamp.
If and when you do official releases, also tag builds of that release with the release number. (More generally, if the current source control commit is tagged, tag the Docker image with the source control tag.)
If it's useful for your development workflow, also tag builds that are the tips of branches with their branch name.
Given its prominence it's probably useful to tag something as latest (maybe the most recent release).
Avoid using latest and other tags that you expect to change when referring to built images (in docker run commands, Dockerfile FROM lines, Kubernetes pod specs, ...).

This combination of things could mean the same image is tagged imagename:g1234567, :1.2.3, :master, and :latest, and your CI system would need to do four docker pushes. You would probably expect the first two images to be fairly constant, but the latter two to change routinely. You could then run something like imagename:1.2.3 with some confidence.

(The one special case that comes to mind is a software package that changes rarely and so might need to be rebuilt if there are upstream fixes or security updates. It seems typical to reuse the same tag for this: for instance, ubuntu:18.04 gets updated every week or two.)

This might not be the the most comprehensive answer, but I am accepting it for the reason that it answered my _specific_ confusion/question in OP. I.e., all answer here solved the questions I meant to ask. So for people who come here in the future looking for general answers, please check out other answers as well. — xpt, May 20 '19 at 01:53

score 25 · Answer 2 · answered May 20 '19 at 01:10

Images in docker are referred to by a reference, the most common being an image repository and tag. And that tag is a relative free formed string that points to a specific image. Tags are best thought of as a mutable pointer, it can be changed, you can have multiple pointers pointing to the same image, and it can be deleted while the underlying image may remain intact.

Since the docker does not enforce much structure on the tags (other than verifying it contains valid characters and does not exceed a length limit), enforcing this is an exercise left up to each repository maintainer, and many different solutions have resulted.

For repository maintainers, here are a few common implementations:

Option A: Ideally, repository maintainers follow some form of semver. This version number should map to the version of the packaged software, often with an additional patch number for the image revision. Importantly, images tagged this way should include tags not just for version 1.2.3-1, but also 1.2.3, 1.2, and 1, each of which are updated to the latest release within their respective hierarchy. This allows downstream users to depend on 1.2 and automatically get the updates for 1.2.4, 1.2.5, etc, as bug fixes and security updates come out.

Option B: Similar to the semver option above, many projects include other important metadata with their tags, e.g. which architecture, or base image, was used for that build. This is commonly seen with alpine vs debian/slim images, or arm vs amd compiled code. These will often be combined with semver, so you may see tags like alpine-1.5, in addition to alpine-1 and alpine tags.

Option C: Some projects follow more of a rolling release that offer no backward compatibility promises. This is often done with build numbers or a date string, and indeed Docker itself uses this, though with a process to deprecate features and avoid breaking changes. I've seen quite a few internal projects with companies use this strategy to version their images, relying on build number from a CI server.

Option D: I'm less of a fan of putting Git revision hashes as image tags since these convey no details without referring back to the Git repository. Not every user may have this access or skill to understand this reference. And by looking at two different hashes, I have no idea of which is newer or compatible with my application without an external check. They also assume the sole important version number is from Git, and ignore that the same Git revision may be used to create multiple images, from different parent images, different architectures, or just multiple Dockerfiles/multistage targets within the same Git repo. Instead, I like using label schema, and eventually the image spec annotations once we get tooling around image annotations, to track details like Git revisions. This places the Git revision into metadata that you can query to verify an image, while still leaving the tag itself to be user informative.

For image users, if you have a requirement to avoid unexpected changes from upstream, there are two options I know of.

The first is to run your own registry server, and pull your external dependencies to a local server. Docker includes an image for a standalone registry that you can install, and the API is open which has allowed many artifact repository vendors to support the docker registry. Do take care to regularly update this registry, and include a way to go back to previous versions if an update breaks your environment.

The second option is to stop depending on mutable tags. Instead, you can use image pinning which refers to the registry's sha256 unique reference to the manifest that cannot be changed. You can find this value in the RepoDigests when you inspect an image pulled from a registry server:

$ docker inspect -f '{{json .RepoDigests}}' debian:latest
["debian@sha256:de3eac83cd481c04c5d6c7344cd7327625a1d8b2540e82a8231b5675cef0ae5f"]

$ docker run -it --rm debian@sha256:de3eac83cd481c04c5d6c7344cd7327625a1d8b2540e82a8231b5675cef0ae5f /bin/bash
root@ac9db398dc03:/#

The biggest risk from binding to a specific image like this is missing security updates and important bug fixes. If you take this option, make sure to have a procedure to regularly update these images.

Regardless of which solution you follow for pulling images, using latest is only useful for a quick developer test, not for any production use cases. The behavior of latest entirely depends on the repository maintainer, some always update it to the last release, some make it the last stable release, and some forget to update it at all. If you depend on latest, you'll likely experience an outage when upstream images change from a version like 1.5 to 2.0, with backwards-incompatible changes. Your next deploy will inadvertently include these changes unless you explicitly depend on a tag that offers the promise of bug fixes and security patches without breaking changes.

Thanks a lot for the most comprehensive answer. I would have accepted it if it answers my specific confusion/question in OP. But thanks anyway and upvoting! — xpt, May 20 '19 at 01:48

Slawomir · Answer 3 · 2019-05-19T23:37:12.360

9

For me it's all about being able to tell what version of (my) software went into the Docker image. My recommendation is to use something like the git's short version ID. I don't use latest as it carries no helpful context.

Build the Docker image with the Git version as the tag. The stable-package-name below is just a name of your application like "HelloWorld" or anything you like:

REV_TAG=$(git log -1 --pretty=format:%h)
docker build -t <stable-package-name>:$REV_TAG .

Later I push what I tagged to the remote repository:

# nominate the tagged image for deployment
docker tag <stable-package-name>:$REV_TAG <repository-name>:$REV_TAG

# push docker image to remote repository
docker push <repository-name>

edited May 19 '19 at 23:37

answered May 19 '19 at 22:38

Slawomir

3,194
1
30
36

Thanks. So you don't need pushing/publishing to docker hub? – xpt May 19 '19 at 22:44
No you don't. How you use your Docker images depends on the environment you will deploy them to. For example, AWS offers a Docker repository (ECR) that allows you to push your Docker images to, so there is no Docker Hub dependency in this case. Using the AWS ECR, you can deploy the images to compute nodes with an orchestration tool like Kubernetes or the AWS's own ECS (Elastic Container Service). – Slawomir May 19 '19 at 22:56
Oh, sorry, I was asking because the pushing/publishing part is missing from your answer, as I ***do*** need to push/publish my docker image to Docker repository. OP updated. – xpt May 19 '19 at 23:25
I updated the answer; once you tagged the build with a version you can push it to a remote repository like docker hub. – Slawomir May 19 '19 at 23:39

score 9 · Answer 4 · answered Dec 02 '19 at 16:59

I tag with the git commit hash and the build timestamp (concatenated)

This is simply because I want to recognise that sometimes things change on the build server which mean the same code may have been compiled differently. E.g. switching the build server to compile with Java 13 instead of Java 11.

score 3 · Answer 5 · answered May 19 '19 at 22:31

3

For a docker-based application, I tag them with the short hash of the git commit. That way, you can immediately identify the code that is in the container. I'm not sure about how I would handle a docker image created to be used as a base image.

answered May 19 '19 at 22:31

djheru

3,525
2
20
20

Properly Versioning Docker Images

5 Answers5

Linked

Related