Our application reads and writes lots of encrypted files and we have created classes that implement InputStream and OutputStream to handle all these details. These implementation work fine (and are very convenient to use, by the way).
To be a bit more specific, when we write (in our OutputStream implementation), we utilize the Bouncy Castle (version 1.53) CMS classes and we compress, sign, encrypt, and sign again.
The problem we have is that we occasionally write large (>1GB after compression, 10+GB before compression) files and these can take 1+ hours to write. During this process, one CPU is pegged; the bottleneck is the compression step (based on profiling). It is not an option to not compress - we get good compression (better than 10x), these files are transported across a network (so much smaller is much faster), and we pay for storing these files essentially forever, so much smaller is cheaper.
The servers that we typically run this code on are AWS EC2 c5.4xlarge instances, which have lots of memory and 16 CPUs.
Here is the code for where we set up the compression:
import org.bouncycastle.cms.CMSCompressedDataStreamGenerator;
CMSCompressedDataStreamGenerator compressedGenerator = new CMSCompressedDataStreamGenerator();
compressedStream = compressedGenerator.open(signedStream, new ZlibCompressor());
We have run into similar situation where we GZIP (rather than encrypt) large files. We have successfully worked around this performance bottleneck by using a parallel GZIP writer. With the parallel GZIP writer, we can effectively utilize all available CPUs and complete the compression in a fraction of the time compared to using the JDK default implementation.
The question:
- Is there a way to configure/invoke Bouncy Castle to parallelize the ZlibCompressor?
- Is there a different Bouncy Castle compressor that we can/should use that will provide equivalent compression and significantly improve throughput?