I'm trying to merge a number of PDF documents in one. However, the documents have different sources, some of them being created in the computer, some of them scanned with different scanners / softwares. I'm scaling them all to A4 size before joining them.
My problem is with some documents that display OK but, when I check the orientation, it looks as if the document is rotated.
For example, for this document here, it displays OK in the browser and Acrobat Reader, but if I get the information using PyPDF2:
from PyPDF2 import PdfReader
reader = PdfReader(path)
for page in reader.pages:
orientation = page.get('/Rotate')
print(f"Document: {path}")
print(f" Orientation: {orientation}")
print(f" mediabox: {page.mediabox}")
print(f" artbox: {page.artbox}")
print(f" bleedbox: {page.bleedbox}")
print(f" cropbox: {page.cropbox}")
print(f" trimbox: {page.trimbox}")
I get:
Orientation: 90
mediaBox: RectangleObject([0, 0, 792, 542])
artBox: RectangleObject([0, 0, 792, 542])
bleedBox: RectangleObject([0, 0, 792, 542])
cropBox: RectangleObject([0, 0, 792, 542])
trimBox: RectangleObject([0, 0, 792, 542])
This is annoying because, in a subsequent step, I'm adding page numbers to the document, and they all get placed wrong because of the orientation.
Notice that the pages display correctly, they only have the wrong orientation data somehow. If I try to set the orientation rotating the page, e.g.
page.rotate(-orientation)
then they display sideways instead.
How can I correct the orientation?