3

So, I'm trying to make a program which turns a computer into a proxy using this. It all works well, except for gzip/deflate pages.

Whenever I try to uncompress, I get an InvalidDataException stating the magic number in the GzipHeader is incorrect.

I use this function:

private byte[] GZipUncompress(byte[] data)
{
    using (var input = new MemoryStream(data))
    {
        input.Seek(0, SeekOrigin.Begin);

        using (var gzip = new GZipStream(input, CompressionMode.Decompress))
        using (var output = new MemoryStream())
        {
            output.Seek(0, SeekOrigin.Begin);
            gzip.CopyTo(output);

            return output.ToArray();
        }
    }
}

to decompress data. The error:

error
(source: gyazo.com)

Any help would be appreciated.

EDIT: I seem to have gotten somewhere!

As usr suggested, I should write a HTTP parser to get the body and decompress that.

Before parsing: http://pastebin.com/Cb0E8WtT

After parsing: http://pastebin.com/k9e8wMvr

This is the method I use to get to the body:

    private byte[] HTTParse(byte[] data)
    {
        string http = ascii.GetString(data);
        char[] lineBreak = crlf.ToCharArray();
        string[] parts = http.Split(lineBreak);

        List<byte> res = new List<byte>();

        for (int i = 1; i < parts.Length; i++)
        {
            if (i % 2 == 0)
            {
                Regex r = new Regex(@"(.)*: (.)*");
                Regex htt = new Regex(@"HTT(.)*/(.)*.(.)* d{1,50} (.)*");
                if (!r.IsMatch(parts[i]) && !htt.IsMatch(parts[i]))
                {
                    //Console.WriteLine("[TEST] " + parts[i]);
                    res.AddRange(ascii.GetBytes(parts[i]));
                    res.AddRange(ascii.GetBytes("\r\n"));
                }
                
            }
        }
        return res.ToArray();
    }

However, I still get an error saying "The magic number in GZip header is not correct. Make sure you are passing in a GZip stream."

EDIT (2): After copying an answer from here, I have managed to successfully uncompress the body.

The new problem: Firefox.

error
(source: gyazo.com)

I'm now unsure whether or not I even needed to uncompress gzip pages..

Where have I gone wrong now?

Community
  • 1
  • 1
Adam M
  • 113
  • 3
  • 13
  • 1
    Probably, the data is not gzipped. Look at the bytes. What do they look like. – usr Jan 25 '14 at 09:40
  • As a side note: you don't need to "rewind" a fresh `MemoryStream`: `Seek(0, SeekOrigin.Begin)` are redundant – Max Yakimets Jan 25 '14 at 09:59
  • @usr: http://pastebin.com/Cb0E8WtT this is the request of whatismyip.org, it looks like I need to separate the response from the headers. Any ideas how i could do that? – Adam M Jan 25 '14 at 15:02
  • 1
    Use the standard HTTP library to download the response. Use AutomaticDecompression. – usr Jan 25 '14 at 15:05
  • @usr: I would like to keep it this way because it could make it easier for implementing SOCKS. – Adam M Jan 25 '14 at 15:13
  • Then you should probably look for an HTTP parser, or write one. – usr Jan 25 '14 at 15:18
  • @usr: I have edited the main post, hopefully should give a lot better insight. – Adam M Jan 25 '14 at 16:05
  • Ok, you made a parsing error, that's for sure. I have no idea what that error is but I recommend you look at what your function calculates and whether you think it is correct or not. – usr Jan 25 '14 at 16:47
  • @usr: Main post is updated with a near solution. Thanks for helping. – Adam M Jan 25 '14 at 17:05
  • Use Fiddler to inspect what your program sent. It was obviously wrong. Once you know the data it sent, find out why it did that. – usr Jan 25 '14 at 17:12
  • @usr: Firebug gets this response when proxied - gyazo.com/996f98a7d38e1cf9e61097be5adc182c – Adam M Jan 25 '14 at 17:18
  • Use Fiddler, because it shows you the raw data on the wire. You are not obeying the HTTP protocol, so Firebug/fox might not show you useful information. – usr Jan 25 '14 at 17:38
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/46081/discussion-between-adam-m-and-usr) – Adam M Jan 25 '14 at 17:53
  • @usr: please see the chat i made, having problems with fiddler – Adam M Jan 25 '14 at 18:14

2 Answers2

3

You said, that you use this code for gzip/deflate. But deflate is not the same as gzip, especially it has no magic header like gzip does. Deflate is defined in RFC1951, gzip in RC1952. Also, browsers like Firefox and Chrome (but not Internet Explorer) also accept "raw deflate" according to RFC1950. So before you apply decompression to the body you must first check based on the "Content-Encoding" header which compression is used.

Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172
1

It turns out I never even needed to unzip the compressed data.

However, as per the solution:

I separated the body with the help of this, and attempted to unzip that. What I hadn't realised was that I was sending around 500 blank bytes, which generated a bad request (with the html amongst the compressed data), so I couldn't unzip anyway.

Community
  • 1
  • 1
Adam M
  • 113
  • 3
  • 13