For this example PDF, I did this:
import fitz
doc = fitz.open("PDF-export-example-image-ocr.pdf")
print(f"(1) {doc[0].bound()=}")
print(f"(2) {doc[0].MediaBox=}")
print(f"(3) {doc[0].getImageList()}")
doc.close()
which gives:
(1) doc[0].bound()=Rect(0.0, 0.0, 612.0399780273438, 792.530029296875)
(2) doc[0].MediaBox=Rect(0.0, 0.0, 612.0399780273438, 792.530029296875)
(3) [(15, 0, 1275, 1651, 8, 'DeviceRGB', '', 'R12', 'DCTDecode')]
I expected (1) and (2) to be the same, although I don't understand why there are two ways to get the same.
What I don't understand is why the value of the image in (3) is so much bigger than the page on which it is. Can somebody explain that?