5

I need to ship a compiled version of a python script and be able to prove (using a hash) that the compiled file is indeed the same as the original one.

What we use so far is a simple:

find . -name "*.py" -print0 | xargs -0 python2 -m py_compile

The issue is that this is not reproducible (not sure what are the fluctuating factors but 2 executions will not give us the same .pyc for the same python file) and forces us to always ship the same compiled version instead of being able to just give the build script to anyone to produce a new compiled version.

Is there a way to achieve that?

Thanks

Martin Trigaux
  • 5,311
  • 9
  • 45
  • 58
  • Byte level fluctuations of compilation can be expected. What is wrong with shipping compiled versions? – Andrey Sep 13 '16 at 14:02
  • 1) os specifics 2) exact python version 3) time related fluctuations – Andrey Sep 13 '16 at 14:03
  • We need to store a zip file containing the compiled version on a server/drive, maintain a copy of each version and that kind of headache I want to avoid when using git for the code hosting and a build script. Would be way easier if I could just checkout at specific revision, remake the build and check if it is the same. – Martin Trigaux Sep 13 '16 at 14:06
  • 3
    You can store versions on your server, or S3 or any other storage. I think this battle against a compiler is not worth it... – Andrey Sep 13 '16 at 14:17
  • If you're looking to create reproducible zips, even with hashed-source `.pyc`s you've still got to fight zips' inclusion of file permissions, file order, and file modification timestamps. See [github.com/bboe/deterministic_zip](https://github.com/bboe/deterministic_zip) and [_Barriers to deterministic, reproducible zip files_ by Mark Rushakoff (2014)](https://content.pivotal.io/blog/barriers-to-deterministic-reproducible-zip-files) for more details. – Steven Kalt Nov 21 '19 at 20:39

2 Answers2

10

Compiled Python files include a four-byte magic number and the four-byte datetime of compilation. This probably accounts for the discrepancies you are seeing.

If you omit bytes 5-8 from the checksumming process then you should see constant checksums for a given version of Python.

The format of the .pyc file is given in this blog post by Ned Batchelder.

holdenweb
  • 33,305
  • 7
  • 57
  • 77
5

2019 / python3.7+ update: since PEP 552

python -m compileall -f --invalidation-mode=checked-hash [file|dir]
# or
export SOURCE_DATE_EPOCH=1 # set py_compile to use 
python -m py_compile       # pycompile.PycInvalidationMode.CHECKED_HASH

will create .pycs which will not change until their source code changes.

Steven Kalt
  • 1,116
  • 15
  • 25