1

I'm using pytesseract for extracting Persian text from the image but I get nothing! I downloaded fas.traineddata and put it in tessdata but still not working!

here is my code

import cv2
import pytesseract
from unidecode import unidecode

pytesseract.pytesseract.tesseract_cmd = 'D:\\New folder\\tesseract.exe'
img = cv2.imread('B.png')

text = pytesseract.image_to_string(img , lang='fas')

print(text)

and here is the input image

Aref.T
  • 41
  • 5

1 Answers1

0

I had the same problem and somehow I solved this issue by using the following code:

from PIL import Image
def tesseract():
    screen_shot_path = Image.open('name_of_your_pic')
    pytesseract.pytesseract.tesseract_cmd = 'Path_to_your_tesseract_directory'
    
    try:
        text_in_image = pytesseract.image_to_string(screen_shot_path, lang='eng+fas', 
        config='--psm 1')

    finally:
        with open("sample.txt", 'w+', encoding='utf-8') as file:
            file.write(text_in_image)
    
    with open('sample.txt', 'r', encoding='utf-8') as file:
        lines = file.readlines()

    for line in lines:
        print(line.encode('utf-8'))

tesseract()

Finally, I saved the result in a file to be able to encode it much easier.