-1

I use PyPDF2 to get pdf file pages` dimension but it return negative number for some pdfs. Why? Here is an example, starting from second page, the real height is negative.

from PyPDF2 import PdfFileReader

input_file = PdfFileReader(open('file.pdf', "rb"))
for i in range(input_file.getNumPages()):
    page = input_file.getPage(i)
    real_width, real_height = page.mediaBox.getWidth(), 
page.mediaBox.getHeight()
    print(real_width, real_height)

The real height in some cases is negative, how can this happen?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Vahagn
  • 11
  • 3

1 Answers1

1

Because that's the height of the page in the metadata in the file

MediaBox [0 0 792 -612]

You'd have to ask whomever generated the file how they've managed that! You could probably just invert it.

JeffUK
  • 4,107
  • 2
  • 20
  • 34
  • 1
    A more scalable approach is to realize height is a *difference*, rather than an absolute value. `abs(top - bottom)` should work for all possible numbers. In this case, the PDF producer probably was irked by its negative y-axis and so inverted the whole lot. – Jongware Jan 15 '19 at 22:12
  • 1
    @usr2564301 sounds like a bug report for the library is in order. Although you can directly access the mediabox values, I'd edit my answer by I've flagged this question for closure as 'Solved in a way not helpful to other people' – JeffUK Jan 15 '19 at 22:51
  • Your first question was a perfect candidate for SO, it just happened to be an easy answer... now you're just pasted some code in it's no longer really a question.. – JeffUK Jan 16 '19 at 12:33