4

I write code for recognizing words and letters from images using Tesseract-OCR and OpenCV, but it is only suitable for flat letters and words. The question is how to improve this code so that it can recognize rotated and intersecting characters and words? My code:

import pytesseract
from PIL import Image
import warnings
import cv2
import os

warnings.simplefilter('ignore', Image.DecompressionBombWarning)

image=r"C:\Users\name\Desktop\image.png"
preprocess = "thresh"

c = cv2.imread(image)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

filename = "ImageT.png".format(os.getpid())
cv2.imwrite(filename, gray)

pytesseract.pytesseract.tesseract_cmd=r"C:\Users\name\Desktop\Tesseract-OCR\tesseract.exe"

text=pytesseract.image_to_string(Image.open(filename))
print(text)

and some pictures:

Words

Words

Symbols

Symbols

nathancy
  • 42,661
  • 14
  • 115
  • 137
Klopo22
  • 57
  • 4
  • 1
    pytesseract was not trained for curved images based on the [this discussion here](https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951) and [the docs](https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html). You need your own classifier to detect rotating images. – Edeki Okoh Feb 19 '20 at 21:00
  • 1
    Virtually, the image can be rotated in the background and tesseract can be tried for some steps of rotation. Just an idea. Can work but need to apply a good algorithm in background – Yunus Temurlenk Feb 20 '20 at 06:12

0 Answers0