I have multiple pdf invoice which i am trying to parse. I convert them to images and use ocr to get text from the images. One of the pdf has 2 out of 3 pages which are rotated by 90 degrees. How do i detect these rotated pages and correctly rotate them for the ocr to return correct information ?
4 Answers
To keep the image intact, you can set the parameter 'expand' to True
image = image.rotate(270, expand=True)

- 21
- 4
Here is a solution that works for one image but you can do it for a list of images and check each image before saving it back to PDF:
#import library
enter code here
from PIL import Image
#open image file
f=Image.open('test.jpg')
#conver to pdf
pdf=f.convert('RGB')
#if width > than height, rotate it to get portrait
if pdf.width > pdf.height:
pdf=pdf.rotate(270,expand=True)
#save pdf
pdf.save('test.pdf')

- 1,147
- 13
- 13
When you say they are rotated, would it be as simple as they are all meant to be in portrait orientation and some pages are landscape orientation? You should either be able to read the metadata from the PDF of the orientation of the pages, or if that's not available for some reason you might need to use this simple logic to determine it, like rotated = image.width > image.height
With Pillow/PIL it would be easy to rotate the image before OCR:
if rotated:
image = image.rotate(270)
Presumably there could be a case of pages being upside down and unless you have reliable metadata from the PDF, then you might have to first OCR with the most likely direction (say counter-clockwise 90 degrees as per above) and if that doesn't return any text try again after rotating 180 degrees.

- 304
- 1
- 4
-
It worked! Thanks a lot. Only issue is after rotation, part of the image is getting cut. – Developer Jun 19 '19 at 13:21
You can use imutils to rotate without cutting out image boundaries after rotation.
import cv2 as cv
import imutils
img = cv.imread('your_image.png')
imutils.rotate_bound(img, 270) #### 270 for anti-clockwise or 90 for clockwise

- 3,474
- 5
- 14
- 32