PyPDF2 returns negative dimension

Question

I use PyPDF2 to get pdf file pages` dimension but it return negative number for some pdfs. Why? Here is an example, starting from second page, the real height is negative.

from PyPDF2 import PdfFileReader

input_file = PdfFileReader(open('file.pdf', "rb"))
for i in range(input_file.getNumPages()):
    page = input_file.getPage(i)
    real_width, real_height = page.mediaBox.getWidth(), 
page.mediaBox.getHeight()
    print(real_width, real_height)

The real height in some cases is negative, how can this happen?

We can't replicate this without an example file, or at least a description of the 'types' of PDF where this occurs. — JeffUK, Jan 15 '19 at 14:49

JeffUK · Answer 1 · 2019-01-15T16:41:34.000

1

Because that's the height of the page in the metadata in the file

MediaBox [0 0 792 -612]

You'd have to ask whomever generated the file how they've managed that! You could probably just invert it.

edited Jan 15 '19 at 16:41

answered Jan 15 '19 at 16:35

JeffUK

4,107
2
20
34

1

A more scalable approach is to realize height is a *difference*, rather than an absolute value. `abs(top - bottom)` should work for all possible numbers. In this case, the PDF producer probably was irked by its negative y-axis and so inverted the whole lot. – Jongware Jan 15 '19 at 22:12
1

@usr2564301 sounds like a bug report for the library is in order. Although you can directly access the mediabox values, I'd edit my answer by I've flagged this question for closure as 'Solved in a way not helpful to other people' – JeffUK Jan 15 '19 at 22:51
Your first question was a perfect candidate for SO, it just happened to be an easy answer... now you're just pasted some code in it's no longer really a question.. – JeffUK Jan 16 '19 at 12:33

PyPDF2 returns negative dimension

1 Answers1