0

In the following line, when instream is a GZIPInputStream, I found that the values of c are totally random, either greater or less than 1024. But when instream is a FileInputStream, the returned value is always 1024.

int c;
while ((c = instream.read(buffer, offset, 1024)) != -1)
    System.out.println("Bytes read: " + c);

The input source file size is much more than 1024 bytes. Why is the returned value of GZIPInputStream unpredictable? Shouldn't it always read up to the said value 1024? Thanks!

  • 1
    Perhaps there is some chunking, but it are you sure it returns more then 1024? That might cause an exception if buffer is not big enough. – Miserable Variable Mar 04 '12 at 02:23

2 Answers2

1

It's just an artifact of compression. Typically a compressed block in a GZIP (which is variable in size) cannot be read unless the entirety of the block is decompressed.

You are reading blocks:

0           1024           2048           3072           4096...

But if the compressed blocks' boundaries looks like this:

0       892     1201        2104         2924 ...

You're going to get a first read of 892 bytes, then 309 (1201-892), then 903 (2104-1201), etc. This is a slight over-simplification, but not much.

As Miserable Variable commented above, the read should never return MORE than 1024 otherwise that would imply a buffer overrun.

brettw
  • 10,664
  • 2
  • 42
  • 59
  • Is there anyway to ask the GZIPInputStream to read more than its boundary? Right now, in my program, it is reading only up to 511 bytes for the first read. I'm trying to do read till 0-1024 bytes for the first read, then 1025 to 2048 for the second read, etc. Thank you! – dragon525 Mar 04 '12 at 04:08
  • It may or may not work, but you could try wrapping the GZIPInputStream in a BufferedInputStream. The default buffer in the BufferedInputStream is 8192 bytes. The BufferedInputStream should 'read-ahead', and therefore 1024 bytes of decompressed data is likely to always be available to your read() call. – brettw Mar 04 '12 at 05:37
0

No, the returned value does not need to be equal to 1024 - consider what should be returned in the case of a a file of size 4 bytes. Always use the returned value for processing. Also, depending on the encoding type, it may be less than what you would expect due to circumstances out of your control (f.e. a network that only provides 512 bytes/sec).

Tassos Bassoukos
  • 16,017
  • 2
  • 36
  • 40