0

I am using git archive to generate a file which is later hashed to be checked for integrity against a pre-stored hash. However I have not seen anywhere that git archive is intended for bit repeatability, so I fear that any future changes in git itself, tar, or some other internals may suddenly lead to a different archive being produced from the same repository.

I am right in that this is not an intended use of git archive? Or can I use it confidently like this?

Álex
  • 1,587
  • 11
  • 17

1 Answers1

1

This is not an intended feature of git archive. The tar archives generated have changed before to fix bugs. There are some people who try to rely on this nevertheless, including kernel.org, but their systems have been broken when Git updates. I strongly advise against doing this.

Anything using compression (including gzipped tar archives and zip files) is inherently unreproducible because the compression can change between versions of zlib or gzip, as appropriate.

bk2204
  • 64,793
  • 6
  • 84
  • 100
  • Do you have some reference for the problems experienced by kernel.org? I couldn't readily find and would like to read about it. – Álex Sep 28 '19 at 14:24
  • I'm unable to find the thread about it on the list right now, but it's related to the patch "archive: honor tar.umask even for pax headers". We did revert that patch, but other patches have come in before and since that have been potentially problematic. – bk2204 Sep 28 '19 at 16:30
  • This I found: https://code.forksand.com/linux/git_git/commit/15c6ef7b06e57746c7d8ec81c0a125df54434b60 Not much but interesting nonetheless, thanks for the precise pointer – Álex Sep 29 '19 at 11:18