Questions tagged [binary-reproducibility]

the goal or quality of consistently reproducing identical build output given identical source input, or often more specifically the goal of byte-for-byte identical executable files when built repeatedly, perhaps on different machines or at different times

Binary reproducibility is the goal or quality of consistently reproducing identical build output given identical source input, or often more specifically the goal of byte-for-byte identical executable files (or identical checksums, hashes or other digests of those files) when built repeatedly, perhaps on different machines or at different times. The process by which this is achieved is often called a deterministic build or reproducible build.

For software subject to a requirement to be able to determine whether an arbitrary executable resulted from building a specific set of sources, binary reproducibility provides a straightforward, easily explained answer to that requirement. This requirement is often applied to software that is security-sensitive (e.g. Bitcoin Core, Tor), or used in a heavily regulated market (e.g. avionics, health care equipment, licensed gambling).

A number of tools or elements involved in builds may hamper this goal for a variety of reasons. Inclusion of environmental information such as timestamps, compiler versions, user and computer names and absolute paths is common, as is inclusion of a random UUID on every run to simplify matching of an executable with related files such as detached debug symbols or platform-specific native images. Many compilers produce nondeterministic compiler-generated symbol names, either for constructs explicitly defined but not named by in source code, or as an artifact of an implementation detail not surfaced in source code at all. Finally, highly optimizing compilers may employ nondeterministic optimization techniques such as Monte Carlo simulation guided optimization or profile guided optimization.

66 questions
5
votes
2 answers

Can i specify the module version id (MVID) when building a .net assembly?

We have some shared assemblies that get build automatically every night. When there are no changes made to the sources, i would expect the assembly binaries to be exactly the same as a previous version. However, there seem to be minor differences…
oɔɯǝɹ
  • 7,219
  • 7
  • 58
  • 69
5
votes
1 answer

Reproducible saveRDS with environments

I am building an R package and using data-raw and data to store a library of pre-defined RxODE models. This works very well. However, the resulting .rda files change at every generation. Some models contain an R environment, and the serialization…
parasietje
  • 1,529
  • 8
  • 36
5
votes
2 answers

Reproducible builds in python

I need to ship a compiled version of a python script and be able to prove (using a hash) that the compiled file is indeed the same as the original one. What we use so far is a simple: find . -name "*.py" -print0 | xargs -0 python2 -m py_compile The…
Martin Trigaux
  • 5,311
  • 9
  • 45
  • 58
5
votes
2 answers

How do you verify that 2 copies of a VB 6 executable came from the same code base?

I have a program under version control that has gone through multiple releases. A situation came up today where someone had somehow managed to point to an old copy of the program and thus was encountering bugs that have since been fixed. I'd like…
Tim Visher
  • 12,786
  • 16
  • 58
  • 66
4
votes
2 answers

Reproducible Debian install

Is there a way to create a clean Debian-based image (I want it for a container, but it could also be for a virtual) with custom selection of packages that would be binary exactly the same as long as the installed packages and debconf parameters are…
Jan Hudec
  • 73,652
  • 13
  • 125
  • 172
4
votes
0 answers

How to make GCC create checksum-same builds?

In company where I work there is complicated industrial ARM arch router project, consisting primarily of many C and C++ apps with Linux kernel. Currently we are preparing to certification and certification organization wants us to send them all…
Dmitriy Vinokurov
  • 365
  • 1
  • 6
  • 28
4
votes
0 answers

Truly reproducible Docker containers?

There is a security trend called reproducible builds, which aims for having a way to create bit-exact copies of output binaries so that the user can verify whether the version found on the internet is trustworthy. Is there a similar movement and…
d33tah
  • 10,999
  • 13
  • 68
  • 158
3
votes
1 answer

Reproducibility: Failing to rerun code over time

I fear that a running code could fail in the future. I've seen this with tidyverse functions that were running well but after a time returned an error because they had been Defunct. To give some reproducible example try this piece of code from How…
LulY
  • 976
  • 1
  • 9
  • 24
3
votes
2 answers

Reproducible build and binary signing

I'm developing an open source project and I have been working on making the builds reproducible so that my users can compare the checksums of the binaries that I distribute with their own builds (if they were to build the project with/from the…
alexandernst
  • 14,352
  • 22
  • 97
  • 197
3
votes
1 answer

How to make the compilation of python source code reproducible

After installing jsonpickle on my machine ( pip install jsonpickle==1.4.1 --no-compile), I have noticed that the compilation of the pandas.py file in the ext subfolder is not always reproducible. In the ext subfolder I executed the following bash…
Hadronymous
  • 566
  • 5
  • 16
3
votes
0 answers

In Tensorflow, is there a way to set a seed at the session level?

I'm trying to get repeatable results when running a session, but want to change the seed freely between sessions. Something like this: a = tf.random_uniform([1]) #Set seed here to e.g. 123 with tf.Session() as sess: print(sess.run(a)) #Output:…
acester123
  • 125
  • 1
  • 5
3
votes
0 answers

Yagarto (GCC, Win32) compiles same code differently on different PCs

I am using a Yagarto toolchain on Windows to compile a codebase of about 100K lines of code. We have two development PCs. However, they each build slightly different binaries despite having the same toolchain and building the same source code. I…
M.M
  • 138,810
  • 21
  • 208
  • 365
2
votes
0 answers

keras.Model.save changes binary every time model saved

Why does keras.Model.save() produce different binaries with every run, when, AFAIU, I have taken all the necessary steps for complete reproducibility of the results and even binaries? You can verify this by simply executing the following script…
2
votes
1 answer

Using -ffile-prefix-map breaks debugging

At $DAYJOB, I am trying to implement reproducible builds to make debugging released software where we no longer have the full debug versions on our build servers easier, using the tips from reproducible-builds.org. Using the…
nafmo
  • 448
  • 4
  • 19
2
votes
1 answer

Reproducible builds with jlink

A reduced JDK (created with jlink) is part of our application. The required modules for the JDK image is rarely changed. Unfortunately generating a new JDK image (with the exact same jlink parameters) results different output, so clients have to…
palacsint
  • 28,416
  • 10
  • 82
  • 109