[Updated 2014-10-15]
Using Ghostscript
Ghostscript has a small utility program written in PostScript in its source code repository. It's called pdfinflt.ps
. If you are lucky, it may already slumber in a 'toolbin' subdirectory of your Ghostscript installation location. Otherwise, get it here:
Now run it together with your targeted input PDF through the Ghostscript interpreter:
gswin32c.exe -- c:/path/to/pdfinflt.ps your-input.pdf deflated-output.pdf
pdfinflt.ps
will (try to) expand all 'streams' contained in the PDF which use the following compression filters/methods: /FlateDecode
, /LZWDecode
, /ASCII85Decode
, /ASCIIHexDecode
.
It will not attempt to remove /RunLengthDecode
, /CCITTFaxDecode
, /DCTDecode
, /JBIG2Decode
and /JPXDecode
. (Compressed/binary fonts will also pass unchanged into the output PDF.)
If you are in an adventurous mood, you may dare to uncomment those lines in the utility which disable /RunLengthDecode
, /DCTDecode
and CCITTFaxDecode
and see if it still works...
Using qpdf
Another useful tool to transform a PDF into an internal format that enables text editor access is qpdf
. It is a "command-line program that does structural, content-preserving transformations on PDF files".
Example usage:
qpdf \
--qdf \
--object-streams=disable \
input-with-compressed-objects.pdf \
output-with-expanded-objects.pdf
The output of the QDF
-mode enforced by the --qdf
switch organizes and re-orders the objects neatly. It adds comments to track the original object IDs and page content streams. All object dictionaries are written into a "normalized" standard format for easier parsing.
The --object-streams=disable
causes the extraction of (otherwise not recognizable) individual objects that are compressed into another object's stream data.
Using mutool
Artifex, the creators of Ghostscript, offer another under a Free and Open Source Software license available tool: MuPDF
.
MuPDF comes with a command line tool, mutool
, which also can expand compressed PDF object streams:
mutool \
clean \
-d \
-a \
input.pdf \
output.pdf \
4,7,8,9
clean
: re-writes the PDF;
-d
: de-compresses all streams;
-a
: ASCIIhex encodes all binary streams;
4,7,8,9
: selects pages 4, 7, 8 and 9 for inclusion in output.pdf
.
Using pdftk
Last, here is how to use the pdtk
tool to uncompress PDF object's streams:
pdftk your-input.pdf cat output uncompressed-output.pdf uncompress
Note the final uncompress
word in the command line.
Pick your favorite
All above tools are available for Linux, Mac OSX, Unix and Windows.
My own favorite is QPDF
for most practical cases.
However, you should make your own experiments and compare the (different) output of each of the suggested tools. Then make your own pick.