I am trying to extract text from some Pdfs. For this purpose I am using PyMuPDF library (1.19.2) in Python. I am however having some trouble understanding the orientation of pages and images in the Pdfs. When I look at the PDF in Adobe reader, the page appears in correct orientation. However when I check the page rotation in Python using the following code, I get a rotation of 270.
doc = fitz.open(document_name)
doc[0].rotation
Now when I extract an embedded image from the page using the following code
import PIL
from io import BytesIO
img = doc[0].get_images()
image = PIL.Image.open(BytesIO(doc.extract_image(img[0][0])['image']))
I get an image which is rotated consistent with the page rotation I obtained above. The image is shown below
However, if I extract the pixmap of the page using the following code
PIL.Image.open(BytesIO(page.get_pixmap().tobytes()))
The page appears in the orientation which also appears in Adobe reader but not the orientation of embedded image or the rotation value returned above. This image is shown below
My question is what do the rotation values mean and how can I make sure I am extracting correctly oriented images and pages from the PDF?