4

Is there a way to create a clean Debian-based image (I want it for a container, but it could also be for a virtual) with custom selection of packages that would be binary exactly the same as long as the installed packages and debconf parameters are the same?

There would be basically two uses for this:

  • An image that specifies what exact versions of packages it contains could be independently verified (using snapshots or rebuilding packages as far as Debian managed to make those builds reproducible)
  • Easy checking whether any of the packages has a new version, as the image could be simply rebuilt nightly and its checksum would only change once there were actual changes in the packages.

It could be built from a debian-published base image (e.g. the docker image debian:stable) and apt or using debootstrap (IIRC the base Debian image is built with debootstrap as well) or other suitable builder.

Jan Hudec
  • 73,652
  • 13
  • 125
  • 172
  • This is a common cause for docker images not be reproducible, `apt` can push bugfixes which could potentially change your images behaviour, and make it difficult to compare a deployed version built on your disk. See for example [Q: truly reproducible docker containers](https://stackoverflow.com/questions/59141851/truly-reproducible-docker-containers) and [Q: how to do deterministic build for docker](https://stackoverflow.com/questions/53288388/how-to-do-deterministic-builds-of-docker-images) – Att Righ Jun 23 '21 at 13:59
  • Duplicate: https://stackoverflow.com/questions/61903495/is-it-possible-to-setup-a-debian-system-in-a-deterministic-manner – Att Righ Jun 23 '21 at 14:22
  • 2
    @AttRigh, not a duplicate; the accepted answer in that question is totally unacceptable here, because it only addresses package versions, but none of the other concerns of binary equality like deterministic writing of configs or stable timestamps. – Jan Hudec Jun 23 '21 at 21:23
  • cool cool. I'm not interested in identical behaviour (which is of course implied by binary equivalence) than binary equivalence. Porbably put the bounty on the wrong ticket. – Att Righ Jun 24 '21 at 13:37
  • 2
    @AttRigh Debian is trying to make the packages itself reproducible: https://wiki.debian.org/ReproducibleBuilds. But then the package manager would also have to be able to be reproducible and I am not sure anybody ever tried. – Jan Hudec Jun 24 '21 at 13:45
  • I can see the value of that. I've seen a few talks saying how this allows you to verify that binary packages match the source code, so have faith that they haven't been tampered with. For a number of use cases this isn't important - because you can install a specific released version of a pip package that exists in cache and get guaranteed identical behavioura. – Att Righ Jun 24 '21 at 14:13
  • 2
    @AttRigh … and a Docker image is also a package in some sense, so it also makes sense to want to make it reproducible to check there are no unexpected influences from the build environment—and a docker image is created by installing packages (does not matter whether Debian, RedHat or Alpine ones). And so is a VM image (e.g. in OVF). – Jan Hudec Jun 24 '21 at 19:31

2 Answers2

2

If you would like to guarantee that, build your image once, save it using docker save or docker push it somewhere and from then use that image as the base image.

docker save: https://docs.docker.com/engine/reference/commandline/save/
docker push: https://docs.docker.com/engine/reference/commandline/push/

EDIT: This wouldn't work, see comments below.

Roman Pavelka
  • 3,736
  • 2
  • 11
  • 28
  • I don't have any problem keeping the image around, but that completely misses the point. I want to make sure there are no unexpected environmental factors, which means I *want* to build it twice, and then verify they are the same. – Jan Hudec Jun 29 '21 at 13:39
  • Well, then it should be enough to compare Image ID, because that should be sha256 of Image JSON configuration object that contains also sha256 digests of contents of all layers: https://windsock.io/explaining-docker-image-ids/ Should I update my answer with that? – Roman Pavelka Jun 29 '21 at 15:21
  • 1
    It would be enough to compare the image ID *if* the build was deterministic. *But it is not*. All the files will have different timestamps and those are part of the hash. Actually, it might be the largest obstacle in making the build deterministic (the Debian reproducible builds do fake timestamps). Then there is the serialization order of various things, where hash algorithms are, often intentionally, salted. – Jan Hudec Jun 29 '21 at 15:36
  • Crap, you are right, I did not think of that. Would this below help then to fake the time? https://manpages.ubuntu.com/manpages/trusty/man1/datefudge.1.html – Roman Pavelka Jun 29 '21 at 15:44
  • Not really. 1. you can't preload anything to processes in a container (the build is a container too), 2. docker starts the container from a daemon, so you can't preload anything into that either and 3. it does not fudge file timestamps anyway. – Jan Hudec Jun 29 '21 at 15:52
  • I see. My last idea is to manually compute sha256 of content of all files in the tree diregarding the timestamp. Then compare the trees with corresponding content hashes. – Roman Pavelka Jun 29 '21 at 17:11
  • 1
    The OCI image checksums are simply checksums of the manifests, and those in turn include checksums of the .tar files with the content, but it might be possible to write a tool that will repackage an image in a canonical way, discarding the timestamps in the process. For example npm packages are now written with constant timestamps in the .tar files. And so do Debian packages under the reproducible builds tooling. – Jan Hudec Jun 30 '21 at 06:52
1

You can use mmdebstrap, which is supposed to create reproducible installations by default (if the SOURCE_DATE_EPOCH environment variable is set), if not I think that would be considered a bug.

Or you can use debuerreotype

There's also a wiki page tracking this for other tools in Debian at https://wiki.debian.org/ReproducibleInstalls.

Guillem Jover
  • 2,090
  • 2
  • 11
  • 16