1

I'm trying to read a bz2 file using Apache Commons Compress.

The following code works for a small file. However for a large file (over 500MB), it ends after reading a few thousands lines without any error.

try {
    InputStream fin = new FileInputStream("/data/file.bz2");
    BufferedInputStream bis = new BufferedInputStream(fin);
    CompressorInputStream input = new CompressorStreamFactory()
                .createCompressorInputStream(bis);
    BufferedReader br = new BufferedReader(new InputStreamReader(input,
                "UTF-8"));

    String line = "";
    while ((line = br.readLine()) != null) {
        System.out.println(line);
    }
} catch (Exception e) {
    e.printStackTrace();
}

Is there another good way to read a large compressed file?

Benben
  • 1,355
  • 5
  • 18
  • 31
  • This should work - unless there is some bug with the library. Can you generate a not so large test/example? Another test: uncompress the file manually and run the same code using `bis` instead of `input` in the `BufferedReader` construction line. – leonbloy Jun 08 '16 at 12:30
  • Are you running this from a console with a `main` method? (i.e., are you sure an exception is not printed? did you try rethrowing the exception in the catch block?) – leonbloy Jun 08 '16 at 12:37

1 Answers1

0

I was having the same problem with a large file, until I noticed that CompressorStreamFactory has a couple of overloaded constructors that take a boolean decompressUntilEOF parameter.

Simply changing to the following may be all that's missing...

CompressorInputStream input = new CompressorStreamFactory(true)
                .createCompressorInputStream(bis);

Clearly, whoever wrote this factory seems to think it's better to create new compressor input streams at certain points, with the same underlying buffered input stream so that the new one picks up where the last one left off. They seem to think that's a better default, or preferred way of doing it over allowing one stream to decompress data all the way to the end of the file. I've no doubt they are cleverer than me, and I haven't worked out what trap I'm setting for future me by setting this parameter to true. Maybe someone will tell me in the comments! :-)

Tim Hirst
  • 461
  • 3
  • 15