1

I run this code

var data = new MemoryStream();
for (int i = 0; i < 1000; i++)
    data.Write(Encoding.Unicode.GetBytes("0123456789"), 0, Encoding.Unicode.GetBytes("0123456789").Length);

using (MemoryStream fileStream = new MemoryStream(data.ToArray()))
    using (MemoryStream stream = new MemoryStream())
    {
        using (GZipStream destination = new GZipStream(stream, CompressionMode.Compress, true ))
            fileStream.CopyTo(destination);

        stream.Seek(0, SeekOrigin.Begin);
        var ret = stream.ToArray();
        Console.WriteLine(Convert.ToBase64String(ret));
    }

in dotnet core and .NET framework 4.5.2. But results is not equal.

Dotnet Core v3.1

[0] [byte]:31
[1] [byte]:139
[2] [byte]:8
[3] [byte]:0
[4] [byte]:0
[5] [byte]:0
[6] [byte]:0
[7] [byte]:0
[8] [byte]:0
[9] [byte]:10
[10] [byte]:237
[11] [byte]:214
[12] [byte]:177
[13] [byte]:17
[14] [byte]:128
[15] [byte]:48
[16] [byte]:12
[17] [byte]:4
[18] [byte]:193
[19] [byte]:43
[20] [byte]:201
[21] [byte]:96
[22] [byte]:48
[23] [byte]:208
[24] [byte]:127
[25] [byte]:99
[26] [byte]:238
[27] [byte]:130
[28] [byte]:104
[29] [byte]:83
[30] [byte]:133
...
[168] [byte]:32
[169] [byte]:78
[170] [byte]:0
[171] [byte]:0

BAS64: H4sIAAAAAAAACu3WsRGAMAwEwSvJYDDQf2PugmhThT/Sa0dHZ7Oru9XT29cwk4E9cAv6QCf6C34jHzASEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhEzIhE/Z3BhvvZdKzIE4AAA==

.NET 4.5.2

[0] [byte]:31   
[1] [byte]:139  
[2] [byte]:8    
[3] [byte]:0    
[4] [byte]:0    
[5] [byte]:0    
[6] [byte]:0    
[7] [byte]:0    
[8] [byte]:4    
[9] [byte]:0    
[10] [byte]:237 
[11] [byte]:200 
[12] [byte]:65  
[13] [byte]:1   
[14] [byte]:128 
[15] [byte]:32  
[16] [byte]:0   
[17] [byte]:0   
[18] [byte]:177 
[19] [byte]:139 
[20] [byte]:132 
[21] [byte]:162 
[22] [byte]:168 
[23] [byte]:253 
[24] [byte]:139 
[25] [byte]:209 
[26] [byte]:194 
[27] [byte]:215 
[28] [byte]:246 
[29] [byte]:220 
[30] [byte]:232
...
[95] [byte]:32  
[96] [byte]:78  
[97] [byte]:0   
[98] [byte]:0   

BASE64: H4sIAAAAAAAEAO3IQQGAIAAAsYuEoqj9i9HC1/bc6OhsdnW3enr7Gs4555xzzjnnnHPOOeecc84555xzzjnnnHPOOeecc84555xzzjnnnHPOOeecc845536/De9l0rMgTgAA

Why is there such a difference in bytes 8 and 9?

Jones
  • 1,480
  • 19
  • 34
  • Probably a different setting/value in the zip header - GZip apparently has: a 10-byte header, containing a magic number (1f 8b), the compression method (08 for DEFLATE), 1-byte of header flags, a 4-byte timestamp, compression flags and the operating system ID. Sounds like the compression flags/or OS ID. – Charleh Mar 04 '20 at 16:50
  • Why the question? If the decompresses data is the same and the resulting compressed array isn't larger than it was, you shouldn't have any issues. Were you using the compressed data as some kind of key perhaps? – Panagiotis Kanavos Mar 04 '20 at 16:52
  • 1
    There are a variety of ways that the compressed result may be different, whether because of different parameters used for compression or because the compression implementation itself is different. As long as the compressed data complies with the compression algorithm specification and can be decompressed using any other compliant implementation, this is completely fine and expected. See marked duplicate. – Peter Duniho Mar 04 '20 at 20:13
  • @PanagiotisKanavos I change code to show problem. Now the size is different and data compressed too. Decompress data not work in another system. I only compile code in dotnet core. – Jones Mar 04 '20 at 20:20

2 Answers2

2

GZIP has it's own header, and those bytes fall within it.

Offsets 8 and 9 in the gzip header are two bytes to store extra flags, and the OS type.

The extra flags for deflate, 4 means the compressor used a fast algorithm (or low compression. 2 is maximum compression / slow). This is just informational, it doesn't change how you decompress deflate.

For the OS type, "10" is TOPS-20, and "0" is FAT filesystem. Not sure why MS use that value, I think zlib for example defaults to 255 ("unknown"). I don't believe this field is used for any practical purpose.

So just an informational change in the header, which can come from changing the implementation, or what parameters are passed to it

Community
  • 1
  • 1
Fire Lancer
  • 29,364
  • 31
  • 116
  • 182
1

After .NET 4.5, they changed the implementation of GZipStream from a native implementation to a managed one. The difference you're seeing is likely related to this.

Source.

It looks like you can force .NET to use the native implementation using the compatibility switch NetFx45_LegacyManagedDeflateStream.

That said, GZip does not guarantee that two implementations will produce exactly the same bytes when compressing the same data. You should not rely on this.

canton7
  • 37,633
  • 3
  • 64
  • 77