4

I have created a script that combines two PDFs into one side by side, by looking at some of Kurt Pfeifle's answers.

But my problem is that the code isn't flexible. By that I mean if one PDF is larger or has another resolution that the other PDF, the output PDF (side by side PDF) will be bad.

Illustrated it looks like this:

Input file: a.pdf
+--------+ 
|        |
|  a     |
|        |
+--------+

Input file: b.pdf
+--------+ 
|        |
|  b     |
|        |
+--------+

Desired output file: compare.pdf
+--------+--------+ 
|        |        |
|   a    |  b     |
|        |        |
+--------+--------+

So I need to make sure that the PDFs both have the same regular A4 size PDF and resolution before I combine them? I have tried so many codes and scripts, but can't figure this one out. How can I do that? The script needs to be bulletproof so that any PDFs can be used and compared. Even if they haven't got the same size.

My script look like this now and works on some PDFs with the same size and resolution:

gswin64c.exe                        ^
          -o c.pdf                  ^
          -sDEVICE=pdfwrite         ^
          -g11690x8270              ^
          -dFIXEDMEDIA              ^
          -dPDFSETTINGS=/prepress   ^
          -r300                     ^
          -c "<</PageOffset [0 0]>>setpagedevice" ^
          -f a.pdf

This creates c.pdf, looking like this:

c.pdf
+--------+--------+ 
|        |        |
|   a    | (empty)|
|        |        |
+--------+--------+

Next command:

gswin64c.exe                       ^
          -o left-side-outputs.pdf ^
          -sDEVICE=pdfwrite        ^
          -g11690x8270             ^
          -dPDFSETTINGS=/prepress  ^
          -c "<</PageOffset [0 0]>>setpagedevice" ^
          -f b.pdf

This creates left-side-outputs.pdf, looking like this:

left-side-outputs.pdf
+--------+--------+ 
|        |        |
|   b    | (empty)|
|        |        |
+--------+--------+

Next command:

gswin64c.exe                        ^
          -o right-side-outputs.pdf ^
          -sDEVICE=pdfwrite         ^
          -g11690x8270              ^
          -dPDFSETTINGS=/prepress   ^
          -c "<</PageOffset [596 0]>>setpagedevice" ^
          -f c.pdf

This creates right-side-outputs.pdf, looking like this:

right-side-outputs.pdf
+--------+--------+ 
|        |        |
|(empty) |  b     |
|        |        |
+--------+--------+

Last command:

pdftk left-side-outputs.pdf multistamp right-side-outputs.pdf output compare.pdf

This creates the final result, compare.pdf:

Desired output file: compare.pdf
+--------+--------+ 
|        |        |
|   a    |  b     |
|        |        |
+--------+--------+

I hope some gurus out there can help me figure out how to handle PDF input files with different page sizes.

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • Somehow your drawing and your commands do not look correct. I *think* I know what you try to achieve. I'll make an edit of your question. If I misunderstand, please revert my edits... – Kurt Pfeifle Nov 15 '13 at 19:47
  • Have you had a look at pdfnup? It's part of pdfjam a front end to LaTex's pdfpages package. It autoscales the pdfs and works quite robust. – Jakob Nov 15 '13 at 21:08
  • @Jakob: `pdfnup` normally may be better for doing 2-up... But in this case your hint doesn't help much. *First*, the question was explicitely about Ghostscript and pdftk. *Second*, the task is to compare 2 different files, where one resulting "2-up" page is composed of pages from each of the 2 original files. There is no straight-forward way to do this with `pdfnup` that I'm aware of... – Kurt Pfeifle Nov 16 '13 at 20:58
  • @Kurt Pfeifle You are totally right about the question and the limitations, that's why I posted a comment and not an answer! Nevertheless pdfnup is a nice tool to nup multiple pdfs without fiddeling around with gs. – Jakob Nov 16 '13 at 23:35

2 Answers2

3

To your question...

So I need to make sure that the PDFs both have the same regular A4 size PDF and resolution before I combine them?

...the answer is 'Yes, regarding the page size -- No regarding the resolution (doesn't matter).'

Scaling PDF pages with Ghostscript (1)

A command to scale all pages of a mixed-sized PDF to an all-A4 is this:

 gswin64c.exe           ^
     -o all-a4.pdf      ^
     -sDEVICE=pdfwrite  ^
     -g5950x8420        ^
     -dPDFFitPage       ^
     -f input.pdf

This scales media sizes and contents likewise (tested with GS v9.10).

The parameter -dPDFFitPage will always keep the aspect ratio. It will automatically rotate the content to make the best fit. It does not allow 'stretching' or the page into one direction only. This can however be achieved with the next method.


[Update

I think one point about this method I did get across not clearly enough.

The thing is this: if the aspect ration of media from your input file is not already the same as your target media's, then the -dPDFFitPage will not entirely cover your target media.

Assuming your input medium uses a square page size, 500x500 points. If you process this with a target size of A4 (-g5950x8420), then the -dPDFFitPage will keep the square aspece ratio and produce an output size of -g5950x5950 only.

But you cannot leave out -dPDFFitPage either -- otherwise you don't get your original 400x400 content scaled, but only placed on the bigger 595x842 page, placed into the lower left corner.

End of update.]


Scaling PDF pages with Ghostscript (2)

A command to scale all PDF page contents to 50% of both their respective dimensions is this:

 gswin64c.exe                                      ^
     -o 50pc.pdf                                   ^
     -sDEVICE=pdfwrite                             ^
     -c "<</Install {.5 .5 scale}>> setpagedevice" ^
     -f input.pdf

However, this will NOT scale the media boxes at the same time!

If you know that all pages in your PDF file are of the same size, you could use this to scale an A3 PDF to A4:

 gswin64c.exe                                      ^
     -o A4-50pc.pdf                                ^
     -g5950x8420                                   ^
     -sDEVICE=pdfwrite                             ^
     -c "<</Install {.5 .5 scale} /AutoRotatePages /None>> setpagedevice" ^
     -f A3.pdf

However, the first command in my answer will of course also work, and it is more simple to use!

For A5 -> A4 or A4 -> A3 use:

                    {1.415 1.415 scale}

For A3 -> A4 or A4 -> A5:

                    { .707  .707 scale}

But here it gets more interesting now, because you can 'stretch' the contents as well! To scale horizontally to 75% and vertically to 66%, use

     -c "<</Install {.75 .666 scale}>> setpagedevice"

For a kind of 'liquid' scaling between Letter and A4, you may use these:

  • A4 -> Letter: {1.028571 .940617 scale}
  • Letter -> A4: { .972222 1.063131 scale}

For all of the above you can give a -gNNNNxMMMM value (determining a fixed page size for the output PDF -- dimensions in pixels at the default internal resolution of the pdfwrite device, which is 720 ppi, giving for 1 PostScript point 10 pixels...)-

If you do not give a -gNNNNxMMMM value, the original page sizes are used (even if they are of mixed values), but their content will be drawn upon these pages with the scaling factor you specified.

What I do not know right now: A method to 'liquid-scale' each individual page of a mixed sized PDF including the media sizes in one go...

Comparing all-Letter with all-A5 PDF files, based on A4:

Assuming you now want to compare an all-Letter sized PDF to one which is all-A5, and you want to scale both to A4 first, here is what you'd do:

'Liquid'-Scale Letter to A4:

 gswin64c.exe                                      ^
     -o a4-1.pdf                                   ^
     -sDEVICE=pdfwrite                             ^
     -g5950x8420                                   ^
     -c "<</Install{.972222 1.063131 scale}>>setpagedevice" ^
     -f letter.pdf

'Fixed'-Scale A5 to A4:

 gswin64c.exe                                      ^
     -o a4-2.pdf                                   ^
     -sDEVICE=pdfwrite                             ^
     -g5950x8420                                   ^
     -c "<</Install{1.415 1.415 scale}>>setpagedevice" ^
     -f a5.pdf

or, alternatively:

 gswin64c.exe          ^
     -o a4-2.pdf       ^
     -sDEVICE=pdfwrite ^
     -g5950x8420       ^
     -dPDFFitPage      ^
     -f a5.pdf

And now compare both your A4 PDF files....

Optimising your comparison workflow

You can also save one step of the workflow as outlined in your question. Here is a better approach.

First step: prepare left sides (as before)

Assuming you have A4 input, and the final output should be A3:

 gswin64c.exe                   ^
      -o left-sides.pdf         ^
      -sDEVICE=pdfwrite         ^
      -g11900x8420              ^
      -c "<</PageOffset [0 0]>>setpagedevice" ^
      -f a.pdf

This creates:

left-sides.pdf
+--------+--------+   ^
|        |        |   |
|        |        |   |
|  a     |(empty) |  595 pt == 5950 pixels
|        |        |   |
|        |        |   |
+--------+--------+   v

<-----1190 pt----->
   == 11900 pixels

Second step: prepare right sides (all in one go)

 gswin64c.exe                   ^
      -o right-sides.pdf        ^
      -sDEVICE=pdfwrite         ^
      -g11900x8420              ^
      -c "<</PageOffset [595 0]>>setpagedevice" ^
      -f b.pdf

This creates:

right-side.pdf
+--------+--------+   ^
|        |        |   |
|        |        |   |
|(empty) |  b     |  595 pt == 5950 pixels
|        |        |   |
|        |        |   |
+--------+--------+   v

<-----1190 pt----->
   == 11900 pixels

Third step: overlay the two files with pdftk

pdftk right-sides.pdf multistamp left-sides.pdf output compare.pdf

or

pdftk left-sides.pdf multistamp right-sides.pdf output compare2.pdf

This creates:

compare.pdf
+--------+--------+   ^
|        |        |   |
|        |        |   |
|  a     |  b     |  595 pt == 5950 pixels
|        |        |   |
|        |        |   |
+--------+--------+   v

<-----1190 pt----->
   == 11900 pixels

Update regarding Crop-/Trim-/Art-/Bleed-Boxes

One more thing.

Sometimes above commands may not "seem" to work. The reason is, that PDFs do internally not only use the naìvely assumed "page size", but a more complex setup of MediaBox (what we usually regard as "page size"), as well as TrimBox, BleedBox, ArtBox and CropBox. See here for an exact description of these boxes...

To test your PDFs files (inputs as well as results or intermediate results) for all these boxes' values, use the pdfinfo command:

pdfinfo -f 1 -l 5 -box a.pdf
pdfinfo -f 1 -l 5 -box b.pdf
pdfinfo -f 1 -l 5 -box right-sides.pdf
pdfinfo -f 1 -l 5 -box left-sides.pdf
pdfinfo -f 1 -l 5 -box compare.pdf

The CropBox makes PDF viewers (and printers) to only display (or print) that part of the content which is on the MediaBox, if it is defined differently from the MediaBox can get into the way of the rescaling task. It will not be touched by Ghostscript, if it sees one.

It can happen that the file was processed succesfully, but in the viewer it still shows you the same viewport onto the page.

In order to "disarm" the effect of these boxes, you should can use a very crude trick: rename these strings within the PDF to all-lowercase names. Here is how to do it with the sed commandline (may not be available on Windows):

cat input.pdf                    \
   | sed 's#CropBox#cropbox#g'   \
   | sed 's#TrimBox#trimbox#g'   \
   | sed 's#BleedBox#bleedbox#g' \
   | sed 's#ArtBox#artbox#g'     \
> disarmed.pdf

or, somehow shorter, but not as easy to parse:

sed 's#CropB#cropb#g;s#TrimB#trimb#g;s#BleedB#bleedb#g;s#ArtB#artb#g' \
  in.pdf > out.pdf

Since Ghostscript is a binary file format, with some versions of sed you may encounter an error message saying:

sed: RE error: illegal byte sequence

In this case try a different flavor, like GNU sed, gsed...

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • wow thank you so much Kurt - you have put a lot of effort in your answer - really appreciate it. Have some testing to do now :-) I have one more issue I hope you can help me with. If you try to make this PDF 11900 pixels in width, it simply can't be done, can you figure out why? http://gassalg.dk/~/media/gassalg.dk/dokumenter/regnskab/hmn_annual_report12_uk_issuu_01a.ashx – Mark Chabert Bergh Nov 15 '13 at 22:18
  • The code is quite "standard" but it can't change the size of the PDF.: gswin64c.exe -o left-side-outputs.pdf -sDEVICE=pdfwrite -g11900x8420 -dFIXEDMEDIA b.pdf – Mark Chabert Bergh Nov 15 '13 at 22:21
  • @MarkChabertBergh: The link you gave returns a blank page for me... -- `-dFIXEDMEDIA` is not from *my* command. As I pointed out in a comment to KenS, `-dFIXEDMEDIA`is automatically implied by using `-gNNNxMMM` anyway. What does `gswin64c.exe -version` return for you? -- What do you mean by *'it simply can't be done'*?? Any error messages you get? – Kurt Pfeifle Nov 16 '13 at 01:37
  • gswin64c.exe -version returns Ghostscript 9.10 (2013-08-30). I have removed dFIXEDMEDIA now. I don't get any error messages, but Ghostscript simply can't expand the width of the PDF to 11900 px as the script requests. The output file remains the same width.? Can't figure out why and how it then should be done? This is just one file that dosn't work, but i found quite a few where the code simply cant change the width. http://wikisend.com/download/524852/hmn_annual_report12_uk_issuu_01a.pdf – Mark Chabert Bergh Nov 16 '13 at 11:25
  • @MarkChabertBergh: Did your really read my last update (done nine hours before your comment)?! The one about the {Trim,Bleed,Crop,Art}Boxes? This is what causes the problem... A 'fix' is also in there... – Kurt Pfeifle Nov 16 '13 at 20:55
  • Sorry didn't see your update, but you were right. I'am using a windows machine but removed the Boxes with advanced-pdf-tools and it works like a charm now.! Thank you so much Kurt - without you i would never have found the solution.! – Mark Chabert Bergh Nov 16 '13 at 22:59
1

PDF files don't contain a resolution, so that can't be a problem. I wouldn't normally use -r with Ghostscript either, all that does is specify the resolution at which any content which cannot be emitted 'as is' into the PDF file is rendered at in order to turn it into an image. It doesn't affect the size or placement of that content.

You shouldn't need /PageOffset, I don't think that will have any effect at all (if the input is PDF).

I would NOT use /PDFSETTINGS. By using that you are importing all kinds of canned settings, unless you are confident that these are all exactly what you want you are much better off using the defaults and flipping any switches you want changed individually.

You may very well want to put /AutoRotatePages=/None, because otherwise pdfwrite will try to make the majority of the text run left to write horizontally.

You are converting one of the files twice, you should try to avoid that, the more conversions the more likelihood of problems.

You have specified media sizes on all three Ghostscript inputs, but you haven't specified FIXEDMEDIA On two of them. For one that's probably fine because its a reprocessing of the first one (where you do specify FIXEDMEDIA) but what about the second instance ?

You don't actually say what the problem that you are experiencing is. Nor do you say of the problem exhibits in the individual files, or only when you use pdftk to merge them together. Without that information, and some sample files that demonstrate the problem, its really not possible to give you any more guidance.

Oh and in passing you could actually do n-up imposition like this with Ghostscript directly, though you;d have to do more work than you do using pdftk. With a little effort I could probably do the whole thing in one Ghostscript invocation.

KenS
  • 30,202
  • 3
  • 34
  • 51
  • Thank you so much for taking the time! How can this be done with n-up imposition with ghostscript? I need the pages to be next to each other on one page (thats twice as wide as a normal A4 then) – Mark Chabert Bergh Nov 15 '13 at 10:55
  • @KenS: specifying media sized with `-gNNNxMMM` implies `-dFIXEDMEDIA`, no? So no need even to specify it on the two commands which missed it... – Kurt Pfeifle Nov 15 '13 at 19:36
  • @KenS: Oh, I also would be interested to see the result of your *'little effort to do the whole thing in one Ghostscript invocation'*. That would be very cool to know and would be **very much** appreciated! ;-) – Kurt Pfeifle Nov 15 '13 at 19:37
  • I'm afraid I wasn't offering to do it, just noting that it could be done. Note that if you want to modify/scale pages in a mixed page size PDF you can redefine setpagedevice. Every time the PDF page size changes Ghostscript will execute setpagedevice with a dictionary argument containing /PageSize. So you can have your redefined function examine the dictionary for the presence of that key. If it is present you replace the array with an appropriate size, and also insert/replace the /Install matrix to scale the CTM. I haven't tried this but it ought to work, I've used similar trickery before. – KenS Nov 16 '13 at 09:12