0

Our site written in php receives PDF files from customers. After processing, the next step is code from a vendor. That code requires that each page is 8.5x11" in portrait. The code also requires that the page have data like this (output from pdfinfo):

Page    1 size: 612 x 792 pts (letter)
Page    1 rot:  0
Page    1 MediaBox:     0.00     0.00   612.00   792.00
Page    1 CropBox:      0.00     0.00   612.00   792.00
Page    1 BleedBox:     0.00     0.00   612.00   792.00
Page    1 TrimBox:      0.00     0.00   612.00   792.00
Page    1 ArtBox:       0.00     0.00   612.00   792.00

Sometimes customers give us PDFs with one or more pages in portrait. We have code that uses FPDI to rewrite the file, rotating the pages that need it by 90 degrees:

$pdf->AddPage('L', array(215.9, 279.4), 90);

The problem is that is just sets rot=90 and the resulting file has this (output from pdfinfo):

Page    3 size: 792 x 612 pts (letter)
Page    3 rot:  90
Page    3 MediaBox:     0.00     0.00   792.00   612.00
Page    3 CropBox:      0.00     0.00   792.00   612.00
Page    3 BleedBox:     0.00     0.00   792.00   612.00
Page    3 TrimBox:      0.00     0.00   792.00   612.00
Page    3 ArtBox:       0.00     0.00   792.00   612.00

Our next step code does not recognize the rot=90 and sees this as a landscape page.

What can I use to change the page characteristics to be like the first set of data (the portrait image)? In our environment ghostscript is available. pdftk can be used (with more difficulty).

ADDED I started with Patrick's pdfrw but ultimately shifted to staying with just FPDF. One odd thing that I noticed: I lost the rotation fix for most files rotated to be only letter, portrait using FPDF and pdfrw when I concatenated to something else using ghostscript. Solution: I ran the "fix rotation" code again after concatenating any files with gs.

Mark Kasson
  • 1,600
  • 1
  • 15
  • 28

2 Answers2

1

Here is an example using my Python package pdfrw. pdfrw may or may not already be installed on your system; if not, you can install it with the usual pip or easy_install (or even possibly apt-get install python-pdfrw), or pull it straight from github.

This example uses a PDF feature called Form XObjects. If your output processing software doesn't support those, then we could create a version that recreates the math done by pdfrw, but updates the actual page streams. (I used to do that before I started using Form XObjects, but the latter are usually much cleaner.)

Also note that pdfrw does not support encrypted PDFs, so those would have to be decrypted first. If this doesn't work on your PDF, send me an example, and I'll look at it.

Those caveats aside, this actually has a bit more functionality than you requested -- it will scale down pages that are too big, and center pages that are too small:

#!/usr/bin/env python


import sys
import os

from pdfrw import PdfReader, PdfWriter, PageMerge


# Change these constants to change output size or orientation

outbox = 0, 0, 8.5 * 72, 11 * 72

# Generic calculations from box

out_x = outbox[0]
out_y = outbox[1]
out_w = outbox[2] - out_x
out_h = outbox[3] - out_y
out_landscape = out_w > out_h

inpfn, = sys.argv[1:]
outfn = 'normalized.' + os.path.basename(inpfn)
inp = PdfReader(inpfn)
out = PdfWriter()

def fixpage():
    # Figure out the rotation requirement before instantiating 
    landscape = bool(rotate % 180) ^ (mbox[2] - mbox[0] > mbox[3] - mbox[1])
    rotation = 0 if landscape == out_landscape else 90

    # Create a canvas, add the page, and get a reference to the page rectangle
    canvas = PageMerge()
    canvas.add(page, rotate=rotation)
    rect = canvas[0]

    # Scale the page rectangle if it doesn't fit our output size
    scaling = max(rect.w / out_w, rect.h / out_h)
    if scaling > 1.0:
        rect.scale(1.0/scaling)

    # Center the page rectangle on the output
    rect.x = out_x + (out_w - rect.w) / 2
    rect.y = out_y + (out_h - rect.h) / 2
    canvas.mbox = outbox
    return canvas.render()

for page in inp.pages:
    rotate = int(page.inheritable.Rotate or 0)
    mbox = tuple(float(x) for x in page.inheritable.MediaBox)
    out.addpage(fixpage() if rotate or mbox != outbox else page)

out.write(outfn)
Patrick Maupin
  • 8,024
  • 2
  • 23
  • 42
  • Wow Patrick, this sounds great! I'll try it out on Monday morning. (Small thing, should the line be out_portrait = out_h > out_w?) – Mark Kasson Apr 02 '17 at 16:31
  • Umm, yeah... Good catch! Would work for your PDF anyway. (Maybe it doesn't look like it, but I did do a bit of testing :-) Will edit. Was considering making it `landscape` instead for more consistency anyway. – Patrick Maupin Apr 02 '17 at 16:59
  • I started to use pdfrw but I changed over to just keeping it in FPDF as per Jan below. I had an issue with both libraries that I added in above. – Mark Kasson Apr 10 '17 at 03:20
1

You can rotate the imported page with FPDF by simply using the Rotate() function from this script before using useTemplate(). There's no need to switch to another language.

Jan Slabon
  • 4,736
  • 2
  • 14
  • 29
  • Thanks, Jan. I started with Patrick's pdfrw but shifted to using FPDF only and ultimately was able to get the result I wanted. There was one issue that I had with both libraries that have now noted above. – Mark Kasson Apr 10 '17 at 03:21