I think that the problem is likely to be the Gzip file header.
The Gzip format has provision for including a file name and file timestamp in the file headers. (I see you are using the -n
when uncompressing and recompressing ... which is probably correct here.)
The Gzip format also includes an "operating system id" in the header. This is supposed to identify the source file system type; e.g. 0 for FAT, 3 for UNIX, and so on.
Either of these could lead to differences in the Gzip files and hence different hashes.
If I was going to solve this myself, I would start by using cmp
to see where the compressed file differences start, and then od
to identify what the differences are. Refer to the Gzip file format spec to figure out what the differences mean:
- RFC 1952 - GZIP file format specification version 4.3
- Wikipedia's gzip page.
How can I get it to match the original SHA using gzip
and gunzip
?
Assuming that the difference is the OS id, I don't think there is a practical way to solve this with the gzip
and gunzip
commands.
I looked at the source code for GZIPOutputStream
in Java 11, and it is not promising.
- It is hard-wiring the timestamp to zero.
- It is hard-wiring the OS identifier to zero (which is supposed to mean FAT).
The hard-wiring is in a private
method and would be next to impossible to "fix" by subclassing or reflection. You could copy the code and fix it that way, but then you have to maintain your variant GZIPOutputStream
class indefinitely.
(I would be looking at changing the application ... or whatever ... so that I didn't need the checksums to be identical. You haven't said why you are doing this. It is for testing purposes only, try looking for a different way to implement the tests.)