Correcting PDF pages with wrong orientation information with PyPDF2

Question

I'm trying to merge a number of PDF documents in one. However, the documents have different sources, some of them being created in the computer, some of them scanned with different scanners / softwares. I'm scaling them all to A4 size before joining them.

My problem is with some documents that display OK but, when I check the orientation, it looks as if the document is rotated.

For example, for this document here, it displays OK in the browser and Acrobat Reader, but if I get the information using PyPDF2:

from PyPDF2 import PdfReader

reader = PdfReader(path)
for page in reader.pages:
    orientation = page.get('/Rotate')
    print(f"Document: {path}")
    print(f"    Orientation: {orientation}")
    print(f"    mediabox:    {page.mediabox}")
    print(f"    artbox:      {page.artbox}")
    print(f"    bleedbox:    {page.bleedbox}")
    print(f"    cropbox:     {page.cropbox}")
    print(f"    trimbox:     {page.trimbox}")

I get:

        Orientation: 90
        mediaBox:    RectangleObject([0, 0, 792, 542])
        artBox:      RectangleObject([0, 0, 792, 542])
        bleedBox:    RectangleObject([0, 0, 792, 542])
        cropBox:     RectangleObject([0, 0, 792, 542])
        trimBox:     RectangleObject([0, 0, 792, 542])

This is annoying because, in a subsequent step, I'm adding page numbers to the document, and they all get placed wrong because of the orientation.

Notice that the pages display correctly, they only have the wrong orientation data somehow. If I try to set the orientation rotating the page, e.g.

page.rotate(-orientation)

then they display sideways instead.

How can I correct the orientation?

score 0 · Answer 1 · answered Jun 03 '22 at 19:57

There are two ways to change the orientation of a page. I don't quite understand why you want the /Rotate attribute to be zero; it doesn't tell you what the correct orientation is but rather applies a rotation to the content of that page before the content is displayed to the users.

The `/Rotate` attribute

from PyPDF2 import PdfReader, PdfWriter
from PyPDF2.generic import NameObject, NumberObject

# Add stuff to the PdfWriter
reader = PdfReader("example.pdf")
writer = PdfWriter()
writer.add_page(reader.pages[0])

# Change it in the writer
writer.pages[0][NameObject("/Rotate")] = NumberObject(90)
# Or simpler: writer.rotate(90)

# Write content back
with open("output.pdf", "wb") as fp:
    writer.write(fp)

Use a transformation matrix

Using the PyPDF2 docs on transformations:

from PyPDF2 import PdfReader, PdfWriter, Transformation
from PyPDF2.generic import NameObject, NumberObject

# Add stuff to the PdfWriter
reader = PdfReader("example.pdf")
writer = PdfWriter()
writer.add_page(reader.pages[0])

# Change it in the writer
transformation = Transformation().rotate(90) 
# you need to add .translate(tx=123, tx=456)
# as the coordinate system typically has it's origin in the bottom-left corner
writer.pages[0].add_transformation(transformation)


# Or simpler: writer.rotate(90)

# Write content back
with open("output.pdf", "wb") as fp:
    writer.write(fp)

Correcting PDF pages with wrong orientation information with PyPDF2

1 Answers1

The /Rotate attribute

Use a transformation matrix

The `/Rotate` attribute