How to improve the result of pytesseract?

Asked Mar 08 '20 at 00:12

Active Mar 08 '20 at 00:22

Viewed 387 times

I am applying pytesseract to my project and I did not get the desired results, so I started to optimize a bit ...

I trained the font from the website
I made the image binary (Black and white)
I put only the characters that will have the images (A to Z in uppercase)
Since they are single character, I put in config "--psm 10"
In a desperate measure, with Photoshop I raised the DPI from 72 to 600

But even with all this and having a clear isolated and visible letter, instead of the "A" I get a "T" ... Is there something where I am failing? I would really appreciate your help :)

import pytesseract
import pyautogui
import cv2
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'



celda1 = cv2.imread('imagen.jpg')



sret=pytesseract.image_to_string(celda1, config="-c tessedit"
                                         "_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                                         " --psm 10"
                                         " -l osd"
                                         " ")



print(sret)

edited Mar 08 '20 at 00:22

furas

134,197
12
106
148

asked Mar 08 '20 at 00:12

renny sanchez

as I know if text is too small then it may have problem, but if text is too big then it may have problem too. Tesseract documentation: [Improving the quality of the output](https://tesseract-ocr.github.io/tessdoc/ImproveQuality) – furas Mar 08 '20 at 00:19
for this single image I get `"A"` when I remove `" -l osd"` – furas Mar 08 '20 at 00:29
1

right friend, it was that, it works perfect, thanks a lot :D – renny sanchez Mar 08 '20 at 00:40
Another way is to train ``tesseract`` .:) – jizhihaoSAMA Mar 08 '20 at 06:22

How to improve the result of pytesseract?

0 Answers0