0

I have an inputstream and I tried to process it but it gave me this error "not in gzip format" but the file is in gzip format "Content-Encoding: gzip"

protected String readResponse(InputStream is) throws IOException {
StringBuffer string;
int b;
byte[] buffer;
String eol, s = null;
GZIPInputStream gis;
int read;
int index;


eol = new String(new byte[] {(byte)0, (byte)0, (byte)-1, (byte)-1});
buffer = new byte[1];
string = new StringBuffer();
while ( (b = is.read()) > 0 ) {
  buffer[0] = (byte)b;
  s = new String(buffer);
  string.append(s);
  index = string.indexOf(eol);
  if ( index > 0 && index == string.length() - 4 ) {
    break;
  }

}

System.out.println(string);

gis = new GZIPInputStream(is); << here I got the error
buffer = new byte[1024]; 

while ( (read = gis.read(buffer)) > 0 ) {
  string.append(new String(buffer, 0, read));
}
return string.toString();

}

any thoughts? thanks

abdulla-alajmi
  • 491
  • 1
  • 9
  • 17
  • 2
    So you're saying that Java is lying to you? – Kayaman Jan 23 '15 at 06:52
  • 2
    I don't know what you are trying to achieve with that code but here's a hint: _don't use `String` for binary data_ – fge Jan 23 '15 at 06:52
  • Can you post the file, or the first 100 or so bytes as a hexdump? – Adam Jan 23 '15 at 06:53
  • This is not the complete code - the way you pasted it here it can't compile. Please post the entire class! – Nir Alfasi Jan 23 '15 at 06:55
  • Aside from anything else, your approach to character conversion assumes ISO-8859-1, and is very inefficient. If you're looking for a particular *byte* pattern, I suggest you do that just by looking at *bytes*... and then convert the binary data to text in more conventional ways, specifying an encoding. – Jon Skeet Jan 23 '15 at 06:57
  • Why you are reading is before applying GZIPInputStream? Does file start with something before gzipped part? – ikettu Jan 23 '15 at 07:00
  • @Kayaman I think so Lol – abdulla-alajmi Jan 23 '15 at 07:01
  • This isn't HTTP, is it? – laune Jan 23 '15 at 08:44
  • @ikettu I did apply the GZIPInputStream before reading it and it didn't work, check the code now, the System.out.println(string); will print the header info such as Content-Encoding: gzip – abdulla-alajmi Jan 23 '15 at 17:33
  • @laune it's a socket – abdulla-alajmi Jan 23 '15 at 17:33
  • HTTP is a protocol. It's very likely you receive this over a socket. - The HTTP header is terminated by an empty line, so I don't understand this 0/0/0xFF/0xFF parsing. – laune Jan 23 '15 at 18:03
  • @laune if I run my code it will print this http://i.imgur.com/6FGodJE.png – abdulla-alajmi Jan 23 '15 at 19:14
  • OK, so this is a HTTP header. The normal procedure is to read *lines* until you encounter the empty line (after Content-Type) and then you switch to gzip reading. Or, you read from then on, until you encounter end-of-file, store the bytes (memory or file) and read gzip from this intermediary storage. – laune Jan 23 '15 at 19:59
  • @laune exactly I was reading until (content-type) then switch to gzip but now after the server changed I couldn't do it!, and there is (f) after the 2 empty lines! I don't know what is that for – abdulla-alajmi Jan 23 '15 at 21:26
  • @laune if I changed {while ( (b = is.read()) > 0 } to {while ( (b = is.read()) > -1 )} if will show me this http://i.imgur.com/IbYTe1Y.png – abdulla-alajmi Jan 23 '15 at 21:35
  • @user2564147 Got around to compose some test data and code all the steps to read a HTTP header followed by gzip data. USing an intermediary file to store the zipped data seemed safest, but it could be done by reading from a memory cache, too. – laune Jan 24 '15 at 10:51

4 Answers4

1

Seeing this line:

eol = new String(new byte[] {(byte)0, (byte)0, (byte)-1, (byte)-1});

is enough to arrive to a conclusion: you are doomed from the start.

DO NOT USE STRING FOR BINARY DATA.

bytes and chars have no relationship to one another; what you are doing here is roughly equivalent to the following:

final CharsetDecoder decoder = Charset.defaultCharset()
    .newDecoder().onMalformedInput(CodingErrorAction.REPLACE);
final ByteBuffer buf = ByteBuffer.wrap(new byte[]{...});
final CharBuffer cbuf = decoder.decode(buf);
final String eol = new String(cbuf.array());

Note the REPLACE action. Any unmappable byte sequence will trigger the decoder to output the Unicode replacement character, U+FFFD (looks familiar, right?).

Now try and put REPORT instead.

What is more, you use the default charset... Which differs from platform to platform.

Your code should really just read the input stream and return a byte array. use a ByteArrayOutputStream.

And if you want to write to a file directly, it's easy: use Files.copy().

Anyway, fixed that for you:

// Note: return code is byte[]
protected byte[] readResponse(final InputStream in)
    throws IOException
{
    try (
        final InputStream gzin = new GzipInputSream(in);
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
    ) {
        final byte[] buf = new byte[4096];
        int bytesRead;
        while ((bytesRead = gzin.read(buf)) != -1)
            out.write(buf, 0, bytesRead);

        return out.toByteArray();
    }
}
fge
  • 119,121
  • 33
  • 254
  • 329
0

The problem could be you're advancing the file pointer in the input stream before you pass it to GZIPInputStream. GZIPInputStream expects the first few bytes to be a standard header.

Try moving new GZIPInputStream(is); before your while loop

Adam
  • 35,919
  • 9
  • 100
  • 137
0

There is so many things wrong in your code..... But lets try anyway. So you have ascii header and after that there shoulbe gzipped part? Gzip file always starts with id bytes. These have the fixed values 'ID1 = 31 (0x1f, \037), ID2 = 139 (0x8b, \213)'. Can you find those from your inputstream. There you should start the gzipstream.

ikettu
  • 1,203
  • 12
  • 17
  • thank you for your help, check this pic http://i.imgur.com/6FGodJE.png how can I detect the gzip file? – abdulla-alajmi Jan 23 '15 at 19:16
  • @ikettu Once you have seen the magic, you are already past the point where to start reading a gzipped stream. How do you propose to cook this? And why isn't it simpler to monitor the end of the HTTP header? – laune Jan 24 '15 at 10:53
  • One could use pushback stream to peek in advance. But just checking the bytecodes while testing would hint you what is actual bytes to mark the end of the headers. – ikettu Jan 24 '15 at 11:13
0

I have tested this with a file composed from a few header lines, followed by an empty line, and an appended gzipped text file. The latter is written, unexpanded, to x.gz and unzipped and read from there, assuming that it is a text file. (If it is a binary file, a BufferedReader is pointless.)

try/with resources and catch should be added, but that's just a technicality.

InputStream is = ...;
StringBuilder lsb = new StringBuilder();
int c = -1;
while( (c = is.read()) != -1 ){
    if( c == '\n' ){
        String line = lsb.toString();
        if( line.matches( "\\s*" ) ){
            break;
        }
        System.out.println( line );
        lsb.delete( 0, lsb.length() );
    } else {
        lsb.append( (char)c );
    }
}
byte[] buffer = new byte[1024];
int nRead = 0;
OutputStream os = new FileOutputStream( "x.gz" );
while ( (nRead = is.read(buffer, 0, buffer.length )) > 0 ) {
    os.write( buffer, 0, nRead );
}
os.close();
is.close();

InputStream gis = new GZIPInputStream( new FileInputStream( "x.gz" ) );
InputStreamReader isr = new InputStreamReader( gis );
BufferedReader br = new BufferedReader(isr);
String line;
while( (line = br.readLine()) != null ){
    System.out.println("line: " + line );
}
br.close();
laune
  • 31,114
  • 3
  • 29
  • 42
  • I found the gzip file on my computer and tried to gunzip it but it says "not in gzip format", the server said it's a gzip file but it's not! – abdulla-alajmi Jan 24 '15 at 14:37
  • If you have used the code I've posted I don't see how this could have failed. - Anyway, you have an exact copy of whatever was sent after the empty line terminating the header - maybe an analysis of these bytes can tell you what it really is. Of course, I can't access the actual site... Not sure whether you can drop-box that dubious file you have. -- I used gzip to create the appendix for the file starting with the headers, and my code snippet runs fine with that. -- At least: the header files should terminate cleanly with the empty line, not showing any gibberish after that line - is that so? – laune Jan 24 '15 at 20:16