I'm trying to take hash of gzipped string in Python and need it to be identical to Java's. But Python's gzip
implementation seems to be different from Java's GZIPOutputStream
.
Python gzip
:
import gzip
import hashlib
gzip_bytes = gzip.compress(bytes('test', 'utf-8'))
gzip_hex = gzip_bytes.hex().upper()
md5 = hashlib.md5(gzip_bytes).hexdigest().upper()
>>>gzip_hex
'1F8B0800678B186002FF2B492D2E01000C7E7FD804000000'
>>>md5
'C4C763E9A0143D36F52306CF4CCC84B8'
Java GZIPOutputStream
:
import java.io.ByteArrayOutputStream;
import java.util.zip.GZIPOutputStream;
import java.io.IOException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class HelloWorld{
private static final char[] HEX_ARRAY = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for (int j = 0; j < bytes.length; j++) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = HEX_ARRAY[v >>> 4];
hexChars[j * 2 + 1] = HEX_ARRAY[v & 0x0F];
}
return new String(hexChars);
}
public static String md5(byte[] bytes) {
try {
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] thedigest = md.digest(bytes);
return bytesToHex(thedigest);
}
catch (NoSuchAlgorithmException e){
new RuntimeException("MD5 Failed", e);
}
return new String();
}
public static void main(String []args){
String string = "test";
final byte[] bytes = string.getBytes();
try {
final ByteArrayOutputStream bos = new ByteArrayOutputStream();
final GZIPOutputStream gout = new GZIPOutputStream(bos);
gout.write(bytes);
gout.close();
final byte[] encoded = bos.toByteArray();
System.out.println("gzip: " + bytesToHex(encoded));
System.out.println("md5: " + md5(encoded));
}
catch(IOException e) {
new RuntimeException("Failed", e);
}
}
}
Prints:
gzip: 1F8B08000000000000002B492D2E01000C7E7FD804000000
md5: 1ED3B12D0249E2565B01B146026C389D
So, both gzip bytes outputs seem to be very similar, but slightly different.
1F8B0800678B186002FF2B492D2E01000C7E7FD804000000
1F8B08000000000000002B492D2E01000C7E7FD804000000
Python gzip.compress()
method accepts compresslevel
argument in range of 0-9. Tried all of them, but none gives desired result.
Any way to get same result as Java's GZIPOutputStream
in Python?