I use a Java BufferedReader object read, line-by-line, a GZIPInputStream that points to a valid GZIP archive that contains 1,000 lines of ASCII text, in typical CSV format. The code looks like this:
BufferedReader buffer = new BufferedReader(new InputStreamReader(
new GZIPInputStream(new FileInputStream(file))));
where file is the actual File object pointing to the archive.
I read through all the file by calling
int count = 0;
String line = null;
while ((line = reader.readLine()) != null)
{
count++;
}
and the reader goes over the file as expected, but at the end it bypasses line #1000 and reads one more line (i.e., count = 1001 after ending the loop).
Calling line.length() on the last line reports a large number (4,000+) of characters, all of which are non-printable (Character.getNumericValue() returns -1).
Actually, if I do line.getBytes() the resulting byte[] array has an equal number of NULL characters ('\0').
Does this seem like a bug in BufferedReader?
In any case, can anyone please suggest a workaround to bypass this behavior?
EDIT: More weird behavior: The first line read is prefixed by the filename, several NULL characters ('\0') and things line username and group name, then the actual text follows!
EDIT: I have created a very simple test class that reproduces the effect I described above, at least on my platform.
EDIT: Apparently false alarm, the file I was getting was not plain GZIP but tarred GZIP, so this explains it, no need for further testing. Thanks everyone!