2

I did thought the following commands where equivalent, but they produce different checksums:

tar -cvzf ... and tar -cvf ...; gzip ... does not produce the same output.

sha1sumdiffers.

What would be the gzip command that would perfectly match the tar -cvzf behavior?

Dima Chubarov
  • 2,316
  • 1
  • 17
  • 28
Olivier
  • 415
  • 3
  • 5
  • 15

1 Answers1

5

The difference in the output files is probably NOT due to the compression method. By default GNU tar uses the standard GZIP deflate mode. One reason for the difference is in the format of GZIP compressed file.

The structure of the first 8 bytes of the GZIP header is as follows

      OFFSET  SIZE  VALUE    COMMENT
        0       1    0x1F    First "magic" id
        1       1    0x8B    Second "magic" id
        2       1    CM      Compression method
        3       1    FLAGS   8-bit flag register
        4       4    MTIME   Object modification time

The problem is with the MTIME field. For data that comes from a pipe this is the value of the current Unix time (seconds since Jan 1, 1970). Therefore two otherwise identical compressed archives created with at least one second interval from each other will be different.

Try to run tar -cvzf twice on the same data and compare the results. The results will differ in the 5th byte, the lowest byte of the timestamp value.

$ tar czvf test1.tgz tmp/ ; sleep 2 ; \
  tar czvf test2.tgz tmp/ ; md5sum test1.tgz test2.tgz
tmp/
tmp/test
tmp/
tmp/test
23d46f62dd4a9a0851279df7fe15842e  test1.tgz
c8ae65026a5f771c63acf87a18f7379c  test2.tgz
Dima Chubarov
  • 2,316
  • 1
  • 17
  • 28