I don't know of a pigz
interface for Python off-hand, but it might not be that hard to write if you really need it. Python's zlib
module allows compressing arbitrary chunks of bytes, and the pigz
man page describes the system for parallelizing the compression and the output format already.
If you really need parallel compression, it should be possible to implement a pigz
equivalent using zlib
to compress chunks wrapped in multiprocessing.dummy.Pool.imap
(multiprocessing.dummy
is the thread-backed version of the multiprocessing
API, so you wouldn't incur massive IPC costs sending chunks to and from the workers) to parallelize the compression. Since zlib
is one of the few built-in modules that releases the GIL during CPU-bound work, you might actually gain a benefit from thread based parallelism.
Note that in practice, when the compression level isn't turned up that high, I/O is often of similar (within order of magnitude or so) cost to the actual zlib
compression; if your data source isn't able to actually feed the threads faster than they compress, you won't gain much from parallelizing.