0

I wanted your help, I've been trying for a few months to make a code that finds a word in the image and returns the coordinates where that word is in the image. I was trying this using OpenCV, OCR tesseract, but I was not successful, could someone here in the community help me?

I'll leave an image here as an example:

enter image description here

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36

2 Answers2

0

Here is something you can start with:

import pytesseract
from PIL import Image


pytesseract.pytesseract.tesseract_cmd = r'C:\<path-to-your-tesseract>\Tesseract-OCR\tesseract.exe'

img = Image.open("img.png")
data = pytesseract.image_to_data(img, output_type='dict')
boxes = len(data['level'])

for i in range(boxes):
    if data['text'][i] != '':
        print(data['left'][i], data['top'][i], data['width'][i], data['height'][i], data['text'][i])

If you have difficulties with installing pytesseract see: https://stackoverflow.com/a/53672281/18667225

Output:

153 107 277 50 Palavras
151 197 133 37 com
309 186 154 48 R/RR
154 303 126 47 Rato
726 302 158 47 Resto
154 377 144 50 Rodo
720 379 159 47 Arroz
152 457 160 48 Carro
726 457 151 46 Ferro
154 532 142 50 Rede
726 534 159 47 Barro
154 609 202 50 Parede
726 611 186 47 Barata
154 690 124 47 Faro
726 685 288 50 Beterraba
154 767 192 47 Escuro
726 766 151 47 Ferro
Markus
  • 5,976
  • 5
  • 6
  • 21
0

I managed to find the solution and I'll post it here for you:

import pytesseract
import cv2
from pytesseract import Output

pytesseract.pytesseract.tesseract_cmd = r'C:\<path-to-your-tesseract>\Tesseract-OCR\tesseract.exe'

filepath = 'image.jpg'
image = cv2.imread(filepath, 1)

# converting image to grayscale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# converting to binary image by Thresholding
# this step is necessary if you have a color image because if you skip this part
# then the tesseract will not be able to detect the text correctly and it will give an incorrect result
threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

# displays the image
cv2.imshow('threshold image', threshold_img)

# Holds the output window until the user presses a key
cv2.waitKey(0)

# Destroying windows present on the screen
cv2.destroyAllWindows()

# setting parameters for tesseract
custom_config = r'--oem 3 --psm 6'

# now feeding image to tesseract 
details = pytesseract.image_to_data(threshold_img, output_type=Output.DICT, config=custom_config, lang='eng')

# Color
vermelho = (0, 0, 255)

#Exibe todas as chaves encontradas
print(details.keys())
print(details['text'])
# For in all found texts
for i in range(len(details['text'])):
# If it finds the text "UNIVERIDADE" it will print the coordinates, and draw a rectangle around the word
    if details['text'][i] == 'UNIVERSIDADE':
        print(details['text'][i])
        print(f"left: {details['left'][i]}")
        print(f"top: {details['top'][i]}")
        print(f"width: {details['width'][i]}")
        print(f"height: {details['height'][i]}")
        cv2.rectangle(image, (details['left'][i], details['top'][i]), (details['left'][i]+details['width'][i], details['top'][i]+details['height'][i]), vermelho)
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 27 '22 at 00:54