28

I tried to find this information around the Docker official docs, but had no success.

Which pieces of information does Docker take into account when calculating the hash of each commit/layer?

It's pretty obvious that the line in the Dockerfile is part of the hash and, of course, the parent commit hash. But is something else take into account when calculating this hash?

Concrete use case: Let's suppose I have two devs in different machines, at different points in time (and because of that, different docker daemons and different caches) running $ docker build ... against the same Dockerfile. The FROM ... directive will give them the same starting point, but will the resulting hash of each operation result on the same hash? Is it deterministic?

Victor Schröder
  • 6,738
  • 2
  • 42
  • 45
  • 2
    Docker 1.10 introduced a new content addressable storage model: see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/ – molivier Mar 31 '16 at 17:29
  • 4
    More in-depth information can be found in the design document; https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b#id-definitions-and-calculations – thaJeztah Apr 01 '16 at 01:55
  • 2
    Thanks @molivier and @thaJeztah! Very good read! It seems that this question is much more profound that I was expecting! – Victor Schröder Apr 01 '16 at 02:21

1 Answers1

18

Thanks @thaJeztah. Answer is in https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b#id-definitions-and-calculations

  1. layer.DiffID: ID for an individual layer

    Calculation: DiffID = SHA256hex(uncompressed layer tar data)

  2. layer.ChainID: ID for a layer and its parents. This ID uniquely identifies a filesystem composed of a set of layers.

    Calculation:

    • For bottom layer: ChainID(layer0) = DiffID(layer0)
    • For other layers: ChainID(layerN) = SHA256hex(ChainID(layerN-1) + " " + DiffID(layerN))
  3. image.ID: ID for an image. Since the image configuration references the layers the image uses, this ID incorporates the filesystem data and the rest of the image configuration.

    Calculation: SHA256hex(imageConfigJSON)

robrich
  • 13,017
  • 7
  • 36
  • 63
  • 2
    Hi @robrich, I verified your 3rd point. It was simple because imageConfigJSON is a file. On a ubuntu VM, I did: sha256sum -b longFileName.json. And it matched the id. The id is the longFileName itself. For point 1, I got the hash of the tar(layer.tar) file inside of the layer folder(sha256sum layer.tar). Now I am not clear about pont 2. What is chain id? – VivekDev Apr 19 '20 at 15:01