3

I'm adding in compression to my project with the aim of improving speed in the 3G Data communication from Android app to ASP.NET C# Server.

The methods I've researched/written/tested works. However, there's added white space after compression. And they are different as well. This really puzzles me.

Is it something to do with different implementation of the GZIP classes in both Java/ASP.NET C#? Is it something that I should be concerned with or do I just move on with .Trim() and .trim() after decompressing?


Java, compressing "Mary had a little lamb" gives:

Compressed data length: 42
Base64 Compressed String: H4sIAAAAAAAAAPNNLKpUyEhMUUhUyMksKclJVchJzE0CAHrIujIWAAAA

protected static byte[] GZIPCompress(byte[] data) {
    try {
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        GZIPOutputStream gZIPOutputStream = new GZIPOutputStream(byteArrayOutputStream);

        gZIPOutputStream.write(data);
        gZIPOutputStream.close();

        return byteArrayOutputStream.toByteArray();
    } catch(IOException e) {
        Log.i("output", "GZIPCompress Error: " + e.getMessage());
        return null;
    }
}


ASP.NET C#, compressing "Mary had a little lamb"

Compressed data length: 137
Base64 Compressed String: H4sIAAAAAAAEAO29B2AcSZYlJi9tynt/SvVK1+B0oQiAYBMk2JBAEOzBiM3mkuwdaUcjKasqgcplVmVdZhZAzO2dvPfee++999577733ujudTif33/8/XGZkAWz2zkrayZ4hgKrIHz9+fB8/Ir7I6ut0ns3SLC2Lti3ztMwWk/8Hesi6MhYAAAA=

    public static byte[] GZIPCompress(byte[] data)
    {
        using (MemoryStream memoryStream = new MemoryStream())
        {
            using (GZipStream gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
            {
                gZipStream.Write(data, 0, data.Length);
            }

            return memoryStream.ToArray();
        }
    }
Uknight
  • 711
  • 6
  • 9
  • Your code shows you compressing *bytes*, but you've given the source as a *string* - how are you getting the bytes from the string? (When I use UTF-8, I get 42 bytes in .NET.) – Jon Skeet Jun 28 '13 at 05:52
  • What version of .NET are you using? – Jon Skeet Jun 28 '13 at 06:03
  • Just to answer here as well in case it's misleading. I didn't want to add too much code into the page to detract from the question. I'm using String.getBytes("UTF-8") and Encoding.UTF8.GetBytes() – Uknight Jun 28 '13 at 06:18
  • Even in .NET 4.5 that class has bugs. Use DotNetZip instead. – Mark Adler Jun 28 '13 at 14:31
  • @MarkAdler Thanks for the suggestion, I'll take a look at it at a later time. I would prefer to stick to native libraries though, as long as it still gets the job done reasonably well. – Uknight Jul 19 '13 at 08:40
  • Alas, .NET does not get the job done reasonably well. – Mark Adler Jul 19 '13 at 14:38

1 Answers1

5

I get 42 bytes on .NET as well. I suspect you're using an old version of .NET which had a flaw in its compression scheme.

Here's my test app using your code:

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

class Program
{
    static void Main(string[] args)
    {
        var uncompressed = Encoding.UTF8.GetBytes("Mary had a little lamb");
        var compressed = GZIPCompress(uncompressed);
        Console.WriteLine(compressed.Length);
        Console.WriteLine(Convert.ToBase64String(compressed));
    }

    static byte[] GZIPCompress(byte[] data)
    {
        using (var memoryStream = new MemoryStream())
        {
            using (var gZipStream = new GZipStream(memoryStream, 
                                                   CompressionMode.Compress))
            {
                gZipStream.Write(data, 0, data.Length);
            }

            return memoryStream.ToArray();
        }
    }
}

Results:

42
H4sIAAAAAAAEAPNNLKpUyEhMUUhUyMksKclJVchJzE0CAHrIujIWAAAA

This is exactly the same as the Java data.

I'm using .NET 4.5. I suggest you try running the above code on your machine, and compare the results.

I've just decompressed the base64 data you provided, and it is a valid "compressed" form of "Mary had a little lamb", with 22 bytes in the uncompressed data. That surprises me... and reinforces my theory that it's a framework version difference.

EDIT: Okay, this is definitely a framework version difference. If I compile with the .NET 3.5 compiler, then use an app.config which forces it to run with that version of the framework, I see 137 bytes as well. Given comments, it looks like this was only fixed in .NET 4.5.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Thank you for your response, my project is using Framework 3.5. I'll look into updating the framework then. Thanks. – Uknight Jun 28 '13 at 06:14
  • @Uknight: It would be worth checking that that really is the problem first... I'll see if I can force the same app to run under 3.5 and try to reproduce your results. – Jon Skeet Jun 28 '13 at 06:17
  • @Uknight: I've just confirmed it. I get 137 bytes under .NET 3.5 too. – Jon Skeet Jun 28 '13 at 06:20
  • I ran your exact code in a .net 4 program. 141 bytes. 0.o -- H4sIAAAAAAAEAOy9B2AcSZYlJi9tynt/SvVK1+B0oQiAYBMk2JBAEOzBiM3mkuwdaUcjKasqgcplVmVdZhZAzO2dvPfee++999577733ujudTif33/8/XGZkAWz2zkrayZ4hgKrIHz9+fB8/Ir7I6ut0ns3SLC2Lti3ztMwWk/8nAAD//3rIujIWAAAA – mcmonkey4eva Jun 28 '13 at 22:35
  • @mcmonkey4eva Well Jon Skeet mentioned that it's working properly in 4.5... It's good for others who visit this question to know that 3.5 and 4.0 has some slight issues with this. – Uknight Jul 19 '13 at 08:37
  • @Uknight: Yup, but the previous comment confirms that it was still broken in 4.0 - I couldn't tell that before. I've edited the answer accordingly. – Jon Skeet Jul 19 '13 at 08:38
  • `Built-in GZip compression support for WCF HTTP/TCP: With this new compression, we expect up to a 5x compression ratio.` http://msdn.microsoft.com/en-us/magazine/hh882452.aspx – JP Hellemons Jul 19 '13 at 08:42
  • @JPHellemons: But that sounds like "using GZIP" is new in 4.5 - it doesn't specifically talk about the GZIP compression ratio itself improving. – Jon Skeet Jul 19 '13 at 08:44