create bounding box on meaningful word

Asked Jun 24 '22 at 07:03

Active Jun 25 '22 at 16:40

Viewed 298 times

I m using pytesseract.image_to_data() on this image:

code to create Bounding Box:

import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('Page_2.jpg')

d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)


cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

I m getting bounding box on each word, like this:

Is there any way to get meaningful word like 'invoice number' in a single bounding box??? like this:

edited Jun 25 '22 at 16:40

Jeru Luke

20,118
13
80
87

asked Jun 24 '22 at 07:03

aditya

you can try merging the closer boxes by analysing distance between them – Prashant Maurya Jun 24 '22 at 07:30
ok, can u plz help me with the code. – aditya Jun 24 '22 at 07:59
1

this should be helpful: https://stackoverflow.com/questions/66490374/how-to-merge-nearby-bounding-boxes-opencv – Prashant Maurya Jun 24 '22 at 09:47
you may merge 2 rectangles if tesseract detect so close to each other laterally. but meaningfull part is not easy to understand. Detect the words, if rectangles are horizontally linear and close to each other and then just cancel 2 rectangle and make 1 – Yunus Temurlenk Jun 24 '22 at 11:22

create bounding box on meaningful word

0 Answers0