21

I can run:

echo "asdf" > testfile
tar czf a.tar.gz testfile
tar czf b.tar.gz testfile
md5sum *.tar.gz

and it turns out that a.tar.gz and b.tar.gz have different md5 hashes. It's true that they're different, which diff -u a.tar.gz b.tar.gz confirms.

What additional flags do I need to pass in to tar so that its output is consistent over time with the same input?

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 3
    The gzip header contains the modification time of the original file. When the input comes from a stream rather than compressing a file, it uses the current time. – Barmar Apr 07 '16 at 00:03
  • @Barmar: Thanks. Do you know how to make this not happen? –  Apr 07 '16 at 00:05
  • I can't think of a good way. I was going to post an answer where you make an uncompressed tarball, copy it with the `-p` option to preserve `mtime`, and then compress each of them. But the problem there is that `gzip` also puts the input filename into the file, and the filenames will be different. – Barmar Apr 07 '16 at 00:28
  • Why don't you compare the checksums of the uncompressed file? – Barmar Apr 07 '16 at 00:28
  • `zcat a.tar.gz | md5sum` and `zcat b.tar.gz | md5sum` – Barmar Apr 07 '16 at 00:29
  • Can you change the accepted answer to Barmars answer please. His is correct and should be in first position. – Harry Apr 11 '16 at 05:40

2 Answers2

30

tar czf outfile infiles is equivalent to

tar cf - infiles | gzip > outfile

The reason the files are different is because gzip puts its input filename and modification time into the compressed file. When the input is a pipe, it uses an empty string as the filename and the current time as the modification time.

But it also has a --no-name option, which tells it not to put the name and timestamp into the file. So if you write the expanded command explicitly, instead of using the -z option to tar, you can make use of this option.

tar cf - testfile | gzip --no-name > a.tar.gz
tar cf - testfile | gzip --no-name > b.tar.gz

I tested this on OS X 10.6.8 and it works.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    Thanks! This is perfect. –  Apr 07 '16 at 00:53
  • Should I delete my post so yours goes to the top? – Harry Apr 08 '16 at 16:47
  • That's fine with me. – Barmar Apr 08 '16 at 16:56
  • I just tried to delete it but because it has an accepted answer flag I wasn't able to. I'm going to ask to OP to accept yours instead. – Harry Apr 11 '16 at 05:39
  • @Harry There are a bunch of MSE questions about what to do in situations like this, e.g. http://meta.stackexchange.com/questions/53235/what-happens-when-an-accepted-answer-is-wrong-but-the-op-is-gone. The best recommendation seems to be to downvote the wrong answer so the right one floats to the top, so I've downvoted yours. – Barmar Apr 12 '16 at 16:36
  • I've flagged mine for moderator attention ie either delete it or remove the accepted flag. – Harry Apr 12 '16 at 17:09
  • I just noticed that. @ElizabethLin could though. – Harry Apr 12 '16 at 17:11
  • I've accepted it, sorry, haven't been at my computer for a while. –  Apr 13 '16 at 01:29
  • @ElizabethLin Thanks. – Harry Apr 13 '16 at 04:30
3

For MacOS:

In man tar we can look at --options section and there we will find !timestamp option, which will exclude timestamp from our gzip archive. Usage:

tar --options '!timestamp' -cvzf archive.tgz filename

It will produce same md5 sum for same files with same names

JerryCauser
  • 811
  • 5
  • 17