6

I'm trying to create a real time OCR in python using mss and pytesseract.

So far, I've been able to capture my entire screen which has a steady FPS of 30. If I wanted to capture a smaller area of around 500x500, I've been able to get 100+ FPS.

However, as soon as I include this line of code, text = pytesseract.image_to_string(img), boom 0.8 FPS. Is there any way I could optimise my code to get a better FPS? Also the code is able to detect text, its just extremely slow.

from mss import mss
import cv2
import numpy as np
from time import time
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\\Users\\Vamsi\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'

with mss() as sct:
    # Part of the screen to capture
    monitor = {"top": 200, "left": 200, "width": 500, "height": 500}

    while "Screen capturing":
        begin_time = time()

        # Get raw pixels from the screen, save it to a Numpy array
        img = np.array(sct.grab(monitor))

        # Finds text from the images
        text = pytesseract.image_to_string(img)

        # Display the picture
        cv2.imshow("Screen Capture", img)

        # Display FPS
        print('FPS {}'.format(1 / (time() - begin_time)))

        # Press "q" to quit
        if cv2.waitKey(25) & 0xFF == ord("q"):
            cv2.destroyAllWindows()
            break
Vamsi
  • 103
  • 1
  • 1
  • 7
  • Recognising text from images is very cpu intensive - as a first step I would look at [binarizing](https://link.springer.com/article/10.1007/s10032-015-0240-4) the input that is passed into image_to_string - this can speed up text recognition significantly. – R3uben Feb 23 '21 at 14:23
  • @R3uben So i added ```ret, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)``` before pytesseract takes in the image, but it still has a slow performace under 1 FPS. Is there anything that I'm doing wrong? – Vamsi Feb 23 '21 at 15:23
  • I also changed the image to grayscale using this ```img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)```, and above I changed it to ```(thresh, bw_img) = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)``` It's still very slow ~1FPS – Vamsi Feb 23 '21 at 15:32

3 Answers3

1

pytesseract is not efficient "by default", as it wraps tesseract executable, it save temporary files to disk etc... If you are serious about performance you need to use tesseract API directly (e.g. via tesserocr or by creating custom API wrapper)

user898678
  • 2,994
  • 2
  • 18
  • 17
  • 1
    I've been trying to install tesserocr for the past few hours and it's so painful for windows 10. I'm using pycharm and the tesserocr package just does not want to install. – Vamsi Feb 23 '21 at 21:55
  • I know: on linux or mac it should be easy. IMO there must be bigger Windows user group to make support for recent python version. – user898678 Feb 25 '21 at 06:51
  • try this: http://www.sk-spell.sk.cx/building-tesserocr-python-package-on-windows-64bit-and – user898678 Mar 19 '21 at 10:53
  • I did some comparative tests between pytesseract and tesserocr, but the performance is not as different as said. – 2badatcoding Jan 28 '22 at 22:02
  • Compared pytesseract to tesserocr on Linux and witnessed almost identical runtimes. – yeamusic21 Jul 07 '22 at 21:25
  • Yes on modern hardware(ssd disk, virtualized env) difference is not so big. But in cases of real-time OCR each time storing input image to disk, initialize tesseract, store output to disk, read output from disk is waste of time regardless we speak about milliseconds. It does not mean that pyttesseract is bad. It wraps tesseract executable (instead of the library) which has pros and cons... – user898678 Jul 09 '22 at 09:15
1

After looking at the pytesseract code I see that it convert the image format and save locally before feeding it to tesseract. By changing from PNG to JPG i got a 3x speedup (9.5 to 3seconds/image). I guess there is more optimization that could be done in the Python code part.

Punnerud
  • 7,195
  • 2
  • 54
  • 44
0

You can use the “easyocr”, a lightweight python package which can be used for OCR applications. It is very fast, reliable and has access to over 70+ languages, including English, Chinese, Japanese, Korean, Hindi, and many more are being added.

"pip install easyocr"

Check this out: https://huggingface.co/spaces/tomofi/EasyOCR