1

I am writing to a file some large Java object I created, and later reading it back. I am using compression since the object is pretty large and I have around 600 different instances of it (each one in a separate file). I am currently using bzip2 with Apache's org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream:

import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
import org.apache.commons.lang3.SerializationUtils;

InputStream in = new BZip2CompressorInputStream(new FileInputStream("myfile.bz2"));
Document doc = (Document) SerializationUtils.deserialize(in);

The problem is that currently decompression takes a long time (over 10 seconds), so reading all 600 objects takes around two hours. I would like to either use a faster compression class, or control the current class's parameters so that decompression would be faster (I am most worried about decompression time as it occurs many times, slow compression is bearable). I am also willing to pay the price of a larger compressed file, for decompression speed.

When compressing using different software you can usually choose "compression level", with values like "Fastest", "Fast", "Normal", "Best". Sometimes you even get more parameters like "Compression Method", "Dictionary Size", "Word Size", etc.

Does anybody know how to control these parameters via code, and what are some recommended values? Or just knows of fast-decompression classes?

OferBr
  • 297
  • 2
  • 13

1 Answers1

3

BZip2 gets very good compression ratios, but at the expense of being quite slow. At the other end of the spectrum is something like snappy, which is incredibly fast, but does not get as good of compression ratios. GZip is in the middle.

Here is a list of some compression benchmarks in java.

Brett Okken
  • 6,210
  • 1
  • 19
  • 25