can not extract persian/farsi text from image in python using pytesseract

Question

I'm using pytesseract for extracting Persian text from the image but I get nothing! I downloaded fas.traineddata and put it in tessdata but still not working!

here is my code

import cv2
import pytesseract
from unidecode import unidecode

pytesseract.pytesseract.tesseract_cmd = 'D:\\New folder\\tesseract.exe'
img = cv2.imread('B.png')

text = pytesseract.image_to_string(img , lang='fas')

print(text)

Check this => https://stackoverflow.com/questions/54763731/tesseract-returns-nothing-for-arabic-words-letters — Khaled Developer, Jun 06 '22 at 14:29

Aryas Karimi · Answer 1 · 2023-06-12T14:36:34.133

I had the same problem and somehow I solved this issue by using the following code:

from PIL import Image
def tesseract():
    screen_shot_path = Image.open('name_of_your_pic')
    pytesseract.pytesseract.tesseract_cmd = 'Path_to_your_tesseract_directory'
    
    try:
        text_in_image = pytesseract.image_to_string(screen_shot_path, lang='eng+fas', 
        config='--psm 1')

    finally:
        with open("sample.txt", 'w+', encoding='utf-8') as file:
            file.write(text_in_image)
    
    with open('sample.txt', 'r', encoding='utf-8') as file:
        lines = file.readlines()

    for line in lines:
        print(line.encode('utf-8'))

tesseract()

Finally, I saved the result in a file to be able to encode it much easier.

can not extract persian/farsi text from image in python using pytesseract

1 Answers1