5

I am trying to compress PDF versions of my school newspaper using code and created the following script which works perfectly below.

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -sOutputFile=$file $old;

I went to run it on the server and discovered that the version of ghostscript on my server was old causing the code to not work, and I don't have permission to update gs (I'm on a shared hosting service.) I do have ImageMagik on the server too and was wondering if anyone could help me compress text heavy PDFs with it. I tried some code similar to

convert -compress JPEG -quality 100 input.pdf output.pdf

but it made the PDF text very blurry (not good for reading newspapers.)

If anyone could help me, it would be greatly appreciated. Thank you!

Billy Jacobson
  • 1,608
  • 2
  • 13
  • 21

2 Answers2

14

ImageMagick also uses Ghostscript to convert your pdf file and it will use the same old version of Ghostscript.

If you want to get a more readable text you should set the density.

convert -density 150 input.pdf -compress JPEG output.pdf

If you want to get images with a higher quality you should not specify JPEG compression. If your PDF is monochrome you can use Group4 compression:

convert -density 150 input.pdf -compress group4 output.pdf

When your PDF is not monochrome you can use LZW/Zip compression:

convert -density 150 input.pdf -compress LZW output.pdf
convert -density 150 input.pdf -compress Zip output.pdf

You could start with 150 and increase it to improve the quality. But that will also increase the size of your file. ImageMagick will convert your pdf to an image and then convert it back to a PDF file that contains only images and not text. I am not sure if this will actually decrease the size of your file but you will have to test that yourself.

dlemstra
  • 7,813
  • 2
  • 27
  • 43
  • 3
    I tried this, and it made my pdf even larger, and it still was blurry. – Billy Jacobson Apr 19 '14 at 02:55
  • 2
    You are rendering to a JPEG, what did you expect ? Use a lossless compression such as TIFF or PNG, JPEG is intended for images (photographs) its useless for text. Or if you only have text in the document, consider rendering to a monochrome compressed format such as CCITT Group 4 Fax (in a TIFF file) – KenS Apr 19 '14 at 08:24
  • 1
    Unfortunately `convert` seems to force JPEG compression for PDFs? It's supposed to support Group 4 (and also JBIG2, which is supposed to be much better), but I can't seem to get it working. – Brian Z May 02 '15 at 05:05
0

Recent 2023 Ghostscript versions do not support -sDEVICE=psmono option which was an easy way to convert any PDF to first monochrome (black and white, not grayscale) postscript file and then back to PDF using -sDEVICE=pdfwrite option. New Ghostscript has monochrome output devices like bmpmono and pngmono, but it seems that Ghostscript is unable to create PDFs from those files any longer.

Artiflex, distributor of the Ghostscript has another open source tool MuPDF, that can do this fairly well. Here is my example of a Windows 10 batch file, which is able to convert any PDF to a smaller monochrome PDF by just dragging and dropping the large file on top of this batch file. This batch file requires mupdf-1.21.0-windows.zip to be unzipped (no installation required) to the same folder where this batch file and PDF are. It will automatically take a backup copy of the original file, so that nothing is lost during the process. It will create very compact 300dpi monochrome multi page PDFs. Save the code into file and name it e.g. DropHereToConvert.bat.

@echo off
rem === Separate the file and folder names of the dropped file (%1) to two different string variables, and replace the empty spaces in filename with underlines ===
set filename=%1
set filename=%filename: =_%
for %%A in ("%filename%") do (
set Folder=%%~dpA
set Name=%%~nA )
echo.Folder is: %Folder%
echo.Name is: %Name%
rem 
rem === Make a backup copy of the dropped PDF file to -- Name_original.pdf -- and remove the space from the end of -- Name -- variable ===
copy %1 "%Name: =_%original.pdf"
rem
rem === Copy the dropped file to a temporary file for -- mutool.exe -- to process ===
copy %1 oldTempFile.pdf
rem
rem === Use -- mutool.exe -- that must be located in the same folder with the processed PDF-file and this batch file to create individual monochrome PBM-type images from each page with following parameters:  
rem    page%%3d.pbm -- names the files leading zeroes: page001.pbm, page002.pbm etc., 
rem    -G7 -- uses gamma value 7 to darken thin lines so that they don't fade away 
rem    -cm -- makes the output monochrome 
rem    -A9 -- no anti-aliasing
rem    -r300 -- defines the dpi-quality of the image files to be 300 dpi ===
mutool.exe draw -o page%%3d.pbm -G7 -A9 -cm -r300 oldTempFile.pdf  
echo Pbm files created 
rem 
rem === Create a list of the pbm filenames into a file -- listOfFiles2.lst -- for creating a single PDF-file containing all individual pages created earlier === 
rem 
rem === First list all PBM-files to a filename list where the names are separated line breaks, using -- dir -- command which will not break batch file, even if there are empty spaces in the filename (this happens with if -- for -- is used at this stage) ===
dir /b *.pbm > listOfFiles.lst 
rem
rem === Then remove the line breaks from the this list to create a single string that contains all filenames separated only by spaces, because mutool.exe requires that kind of input. Add an empty space at the end of each filename with -- "%%i " --, because listing with -- dir -- removes them ===
for /f "usebackqdelims=" %%i in (listOfFiles.lst) do @<nul set /p="%%i ">>listOfFiles2.lst
rem 
rem === Store this single line list into -- %build% -- variable to be used with mutools.exe as a parameter ===
set /p Build=<listOfFiles2.lst
echo Single line list: %Build%
rem 
rem === Use this single line list to create a single monochrome PDF-file with the same file name that was dropped on this batch file. This file contains all monochrome PBM-images in compressed format, and with a filename where all empty spaces are replaced by the underlines. Original dropped file was saved earlier with -- "_original" -- added to the end of the filename. NOTE that -- mutool.exe -- output parameters don't currently specify the paper size or scaling, so in order to print the files one must use -- Fit to page -- or scaling it exactly 32% === 
mutool.exe convert -o "%Name: =%.pdf" -Ocompress-images %Build%
echo Monochrome PDF created 
rem 
rem === Delete all temporary files that were created during the process === 
echo Deleting the temporary files...
del listOfFiles.lst
del listOfFiles2.lst
del *.pbm
del oldTempFile.pdf
Supernuija
  • 19
  • 2