0

I'm using pdf2image to convert a pdf to image(.png). However, the size of the image increases after the conversion. Here's the code I am using:

path = "2x.pdf"
pages = pdf2image.convert_from_path(
        path,
        dpi=300,
        poppler_path=poppler_path,
            )
for page in pages:
    page.save("output_2x.png","PNG")

Code to find the size of the pdf:

from PyPDF2 import PdfFileReader
input1 = PdfFileReader(open('2x.pdf', 'rb'))
input1.getPage(0).mediaBox

Output: RectangleObject([0, 0, 3301, 5100])

Code to find the size of the image:

img = Image.open("output_2x.png")
img.size

Output: (13755, 21250)

The width increases about 4 times whereas the height increases about 8 times.

Vikas Kumar
  • 85
  • 1
  • 11
  • 2
    [`mediaBox`](https://pythonhosted.org/PyPDF2/PageObject.html#PyPDF2.pdf.PageObject.mediaBox) is in "default user space units" (whatever that may be), not pixels. PDF does not have the concept of pixels at all. 13755 pixels at 300 dpi equals 45.85 inches or 1165 mm; does this match the page width of your PDF? – Thomas Dec 29 '21 at 12:22
  • 2
    By the way, "The width increases about 4 times whereas the height increases about 8 times" is not true: width and height are increased by the same factor; the aspect ratio is about 0.647 in both cases. – Thomas Dec 29 '21 at 12:25
  • 1
    a PDF is vector data. PNG files are raster data. they are not comparable. you should expect sizes to differ. – Christoph Rackwitz Dec 29 '21 at 13:40

1 Answers1

1

The PDF format and thus pypdf gives you the size in "default user space units". See https://pypdf.readthedocs.io/en/latest/modules/PageObject.html#pypdf._page.PageObject.user_unit

It is in multiples of 1/72 inch. Hence a value of 1 means a user space unit is 1/72 inch, and a value of 3 means that a user space unit is 3/72 inch.

The unit you want is "pixel". As long as you don't know the resolution that was used, you cannot convert properly.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958