Why is the MuPDF MediaBox of a page smaller than a contained image?

Question

import fitz

doc = fitz.open("PDF-export-example-image-ocr.pdf")

print(f"(1) {doc[0].bound()=}")
print(f"(2) {doc[0].MediaBox=}")
print(f"(3) {doc[0].getImageList()}")


doc.close()

which gives:

(1) doc[0].bound()=Rect(0.0, 0.0, 612.0399780273438, 792.530029296875)

(2) doc[0].MediaBox=Rect(0.0, 0.0, 612.0399780273438, 792.530029296875)

(3) [(15, 0, 1275, 1651, 8, 'DeviceRGB', '', 'R12', 'DCTDecode')]

I expected (1) and (2) to be the same, although I don't understand why there are two ways to get the same.

What I don't understand is why the value of the image in (3) is so much bigger than the page on which it is. Can somebody explain that?

score 0 · Accepted Answer · answered Sep 04 '20 at 08:56

0

The image size you see is how many pixels are in the embedded JPEG image resource. That has literally zero effect on how big the image is going to be when drawn on the page. The physical size of the image on the page is entirely decided by the page content stream commands that draw the image.

answered Sep 04 '20 at 08:56

ccxvii

1,873
1
13
11

Is it possible to get the size of the image within the document with PyMuPDF? – Martin Thoma Sep 04 '20 at 09:06

Why is the MuPDF MediaBox of a page smaller than a contained image?

1 Answers1