JAVA not in gzip format error

Question

I have an inputstream and I tried to process it but it gave me this error "not in gzip format" but the file is in gzip format "Content-Encoding: gzip"

protected String readResponse(InputStream is) throws IOException {
StringBuffer string;
int b;
byte[] buffer;
String eol, s = null;
GZIPInputStream gis;
int read;
int index;


eol = new String(new byte[] {(byte)0, (byte)0, (byte)-1, (byte)-1});
buffer = new byte[1];
string = new StringBuffer();
while ( (b = is.read()) > 0 ) {
  buffer[0] = (byte)b;
  s = new String(buffer);
  string.append(s);
  index = string.indexOf(eol);
  if ( index > 0 && index == string.length() - 4 ) {
    break;
  }

}

System.out.println(string);

gis = new GZIPInputStream(is); << here I got the error
buffer = new byte[1024]; 

while ( (read = gis.read(buffer)) > 0 ) {
  string.append(new String(buffer, 0, read));
}
return string.toString();

}

any thoughts? thanks

I don't know what you are trying to achieve with that code but here's a hint: _don't use `String` for binary data_ — fge, Jan 23 '15 at 06:52
Can you post the file, or the first 100 or so bytes as a hexdump? — Adam, Jan 23 '15 at 06:53
This is not the complete code - the way you pasted it here it can't compile. Please post the entire class! — Nir Alfasi, Jan 23 '15 at 06:55
Aside from anything else, your approach to character conversion assumes ISO-8859-1, and is very inefficient. If you're looking for a particular *byte* pattern, I suggest you do that just by looking at *bytes*... and then convert the binary data to text in more conventional ways, specifying an encoding. — Jon Skeet, Jan 23 '15 at 06:57
Why you are reading is before applying GZIPInputStream? Does file start with something before gzipped part? — ikettu, Jan 23 '15 at 07:00
@ikettu I did apply the GZIPInputStream before reading it and it didn't work, check the code now, the System.out.println(string); will print the header info such as Content-Encoding: gzip — abdulla-alajmi, Jan 23 '15 at 17:33
HTTP is a protocol. It's very likely you receive this over a socket. - The HTTP header is terminated by an empty line, so I don't understand this 0/0/0xFF/0xFF parsing. — laune, Jan 23 '15 at 18:03
@laune if I run my code it will print this http://i.imgur.com/6FGodJE.png — abdulla-alajmi, Jan 23 '15 at 19:14
OK, so this is a HTTP header. The normal procedure is to read *lines* until you encounter the empty line (after Content-Type) and then you switch to gzip reading. Or, you read from then on, until you encounter end-of-file, store the bytes (memory or file) and read gzip from this intermediary storage. — laune, Jan 23 '15 at 19:59
@laune exactly I was reading until (content-type) then switch to gzip but now after the server changed I couldn't do it!, and there is (f) after the 2 empty lines! I don't know what is that for — abdulla-alajmi, Jan 23 '15 at 21:26
@laune if I changed {while ( (b = is.read()) > 0 } to {while ( (b = is.read()) > -1 )} if will show me this http://i.imgur.com/IbYTe1Y.png — abdulla-alajmi, Jan 23 '15 at 21:35
@user2564147 Got around to compose some test data and code all the steps to read a HTTP header followed by gzip data. USing an intermediary file to store the zipped data seemed safest, but it could be done by reading from a memory cache, too. — laune, Jan 24 '15 at 10:51

fge · Accepted Answer · 2015-01-23T07:18:00.320

1

Seeing this line:

eol = new String(new byte[] {(byte)0, (byte)0, (byte)-1, (byte)-1});

is enough to arrive to a conclusion: you are doomed from the start.

DO NOT USE STRING FOR BINARY DATA.

bytes and chars have no relationship to one another; what you are doing here is roughly equivalent to the following:

final CharsetDecoder decoder = Charset.defaultCharset()
    .newDecoder().onMalformedInput(CodingErrorAction.REPLACE);
final ByteBuffer buf = ByteBuffer.wrap(new byte[]{...});
final CharBuffer cbuf = decoder.decode(buf);
final String eol = new String(cbuf.array());

Note the REPLACE action. Any unmappable byte sequence will trigger the decoder to output the Unicode replacement character, U+FFFD (looks familiar, right?).

Now try and put REPORT instead.

What is more, you use the default charset... Which differs from platform to platform.

Your code should really just read the input stream and return a byte array. use a ByteArrayOutputStream.

And if you want to write to a file directly, it's easy: use Files.copy().

Anyway, fixed that for you:

// Note: return code is byte[]
protected byte[] readResponse(final InputStream in)
    throws IOException
{
    try (
        final InputStream gzin = new GzipInputSream(in);
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
    ) {
        final byte[] buf = new byte[4096];
        int bytesRead;
        while ((bytesRead = gzin.read(buf)) != -1)
            out.write(buf, 0, bytesRead);

        return out.toByteArray();
    }
}

edited Jan 23 '15 at 07:18

answered Jan 23 '15 at 07:02

fge

119,121
33
254
329

the code was working but the server changed then it didn't work anymore! but it was working!!! – abdulla-alajmi Jan 23 '15 at 07:07
It was working by pure _luck_. Don't rely on luck. Code which only works "from time to time" doesn't work. – fge Jan 23 '15 at 07:08
still getting the same error here final InputStream gzin = new GZIPInputStream(is); – abdulla-alajmi Jan 23 '15 at 07:25
Then the stream really isn't gzip encoded. Did you just try and read it plainly? – fge Jan 23 '15 at 07:36
look at my code, if I insert a system.out.println(string); after the loop it will print many correct information like: session id and all others and also will print this line (Content-Encoding: gzip) which means it's a gzip – abdulla-alajmi Jan 23 '15 at 17:28
hi fge, what do you think the problem? – abdulla-alajmi Jan 24 '15 at 03:14
Now I'd say all you can try is download the content as raw and analyzing whether it is really a gzipped stream.... – fge Jan 24 '15 at 10:02
how can I do that? see laune code, I downloaded the .gz file but couldn't uncompressed it "not in gzip format" – abdulla-alajmi Jan 24 '15 at 15:09
Download it plain jane, then analyze at the command line. Use `Files.copy()`. – fge Jan 24 '15 at 15:25

Adam · Answer 2 · 2015-01-23T07:11:36.320

0

The problem could be you're advancing the file pointer in the input stream before you pass it to GZIPInputStream. GZIPInputStream expects the first few bytes to be a standard header.

Try moving new GZIPInputStream(is); before your while loop

edited Jan 23 '15 at 07:11

answered Jan 23 '15 at 06:54

Adam

35,919
9
100
137

score 0 · Answer 3 · answered Jan 23 '15 at 17:54

0

There is so many things wrong in your code..... But lets try anyway. So you have ascii header and after that there shoulbe gzipped part? Gzip file always starts with id bytes. These have the fixed values 'ID1 = 31 (0x1f, \037), ID2 = 139 (0x8b, \213)'. Can you find those from your inputstream. There you should start the gzipstream.

answered Jan 23 '15 at 17:54

ikettu

1,203
12
17

thank you for your help, check this pic http://i.imgur.com/6FGodJE.png how can I detect the gzip file? – abdulla-alajmi Jan 23 '15 at 19:16
@ikettu Once you have seen the magic, you are already past the point where to start reading a gzipped stream. How do you propose to cook this? And why isn't it simpler to monitor the end of the HTTP header? – laune Jan 24 '15 at 10:53
One could use pushback stream to peek in advance. But just checking the bytecodes while testing would hint you what is actual bytes to mark the end of the headers. – ikettu Jan 24 '15 at 11:13

score 0 · Answer 4 · answered Jan 24 '15 at 10:50

I have tested this with a file composed from a few header lines, followed by an empty line, and an appended gzipped text file. The latter is written, unexpanded, to x.gz and unzipped and read from there, assuming that it is a text file. (If it is a binary file, a BufferedReader is pointless.)

try/with resources and catch should be added, but that's just a technicality.

InputStream is = ...;
StringBuilder lsb = new StringBuilder();
int c = -1;
while( (c = is.read()) != -1 ){
    if( c == '\n' ){
        String line = lsb.toString();
        if( line.matches( "\\s*" ) ){
            break;
        }
        System.out.println( line );
        lsb.delete( 0, lsb.length() );
    } else {
        lsb.append( (char)c );
    }
}
byte[] buffer = new byte[1024];
int nRead = 0;
OutputStream os = new FileOutputStream( "x.gz" );
while ( (nRead = is.read(buffer, 0, buffer.length )) > 0 ) {
    os.write( buffer, 0, nRead );
}
os.close();
is.close();

InputStream gis = new GZIPInputStream( new FileInputStream( "x.gz" ) );
InputStreamReader isr = new InputStreamReader( gis );
BufferedReader br = new BufferedReader(isr);
String line;
while( (line = br.readLine()) != null ){
    System.out.println("line: " + line );
}
br.close();

I found the gzip file on my computer and tried to gunzip it but it says "not in gzip format", the server said it's a gzip file but it's not! — abdulla-alajmi, Jan 24 '15 at 14:37
If you have used the code I've posted I don't see how this could have failed. - Anyway, you have an exact copy of whatever was sent after the empty line terminating the header - maybe an analysis of these bytes can tell you what it really is. Of course, I can't access the actual site... Not sure whether you can drop-box that dubious file you have. -- I used gzip to create the appendix for the file starting with the headers, and my code snippet runs fine with that. -- At least: the header files should terminate cleanly with the empty line, not showing any gibberish after that line - is that so? — laune, Jan 24 '15 at 20:16

JAVA not in gzip format error

4 Answers4