Bouncy Castle parallel ZlibCompressor

Question

Our application reads and writes lots of encrypted files and we have created classes that implement InputStream and OutputStream to handle all these details. These implementation work fine (and are very convenient to use, by the way).

To be a bit more specific, when we write (in our OutputStream implementation), we utilize the Bouncy Castle (version 1.53) CMS classes and we compress, sign, encrypt, and sign again.

The problem we have is that we occasionally write large (>1GB after compression, 10+GB before compression) files and these can take 1+ hours to write. During this process, one CPU is pegged; the bottleneck is the compression step (based on profiling). It is not an option to not compress - we get good compression (better than 10x), these files are transported across a network (so much smaller is much faster), and we pay for storing these files essentially forever, so much smaller is cheaper.

The servers that we typically run this code on are AWS EC2 c5.4xlarge instances, which have lots of memory and 16 CPUs.

Here is the code for where we set up the compression:

import org.bouncycastle.cms.CMSCompressedDataStreamGenerator;

CMSCompressedDataStreamGenerator compressedGenerator = new CMSCompressedDataStreamGenerator();
compressedStream = compressedGenerator.open(signedStream, new ZlibCompressor());

We have run into similar situation where we GZIP (rather than encrypt) large files. We have successfully worked around this performance bottleneck by using a parallel GZIP writer. With the parallel GZIP writer, we can effectively utilize all available CPUs and complete the compression in a fraction of the time compared to using the JDK default implementation.

The question:

Is there a way to configure/invoke Bouncy Castle to parallelize the ZlibCompressor?
Is there a different Bouncy Castle compressor that we can/should use that will provide equivalent compression and significantly improve throughput?

You will have to write your own `ZlibCompressor`. You should be able to modify the parallel GZIP writer to do this. — President James K. Polk, Feb 20 '19 at 18:52
@JamesKPolk I think you are implying that the output of GZIP compression is directly compatible with the output of ZlibCompressor? Is that true? (A quick search seems to indicate that to be the case.) — Rob, Feb 20 '19 at 19:35
No, that is not the case. GZIP uses the same underlying compression algorithm, but the output formats differ. I'm suggesting you can *modify* the parallel GZIP writer to produce Zlib-compatible output. I believe you only need to modify the header and the trailer. — President James K. Polk, Feb 20 '19 at 19:42

Bouncy Castle parallel ZlibCompressor

0 Answers0