How to get coordinates of the overall bounding box of a text image?

Question

original image

img = cv2.imread('eng2.png')

d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

plt.figure(figsize=(10,10))
plt.imshow(img)

The above code produces this image. Now in the image there are two coordinates one for each word and other for the whole text. I would like to get the coordinates for the whole text (sentences in each line or the whole paragraph

The above code produces this image. Now in the image there are two coordinates one for each word and other for the whole text. I would like to get the coordinates for the whole text (sentences in each line or the whole paragraph)

This is what I have tried

box = pd.DataFrame(d) #dict to dataframe
box['text'].replace('', np.nan, inplace=True) #replace empty values by NaN
box= box.dropna(subset = ['text']) #delete rows with NaN 

print(box)


def lineup(boxes):
    linebox = None
    for _, box in boxes.iterrows():
        if linebox is None: linebox = box           # first line begins
        elif box.top <= linebox.top+linebox.height: # box in same line
            linebox.top = min(linebox.top, box.top)
            linebox.width = box.left+box.width-linebox.left
            linebox.heigth = max(linebox.top+linebox.height, box.top+box.height)-linebox.top
            linebox.text += ' '+box.text
        else:                                       # box in new line
            yield linebox
            linebox = box                           # new line begins
    yield linebox                                   # return last line

lineboxes = pd.DataFrame.from_records(lineup(box))

Output dataframe

n_boxes = len(lineboxes['level'])
for i in range(n_boxes):
    (x, y, w, h) = (lineboxes['left'][i], lineboxes['top'][i], lineboxes['width'][i], lineboxes['height'][i])
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

plt.figure(figsize=(10,10))
plt.imshow(img)

There seems to be no difference between the original coordinates and after joining all the coordinates

How can i get the coordinates of the whole text (sentences in each line or the whole paragraph) using pytesseract library?

Maybe change your approach using exclusively OpenCV. Pre-process the image with some morphology. Apply a dilation with a large structuring element to join the text in blocks/paragraphs. Find external contours on this image and get the bounding rectangles of the blobs. Also, please, include your original unprocessed image. — stateMachine, Jun 07 '22 at 09:40

Jeru Luke · Answer 1 · 2022-06-09T11:51:45.783

You faced a similar issue in one of your previous questions linked here. I failed to elaborate what I meant in the comments. Here is a more visual explanation.

By horizontal kernel I meant an array with single row [1, 1, 1, 1, 1]. The number of columns can be determined based on the font size and space between characters/words. Using the kernel with a morphological dilation operation you can connect individual entities that are present horizontally as a single entity.

In your case, we would like to extract each line as an individual entity. Let's go through the code:

Code:

img = cv2.imread('letter.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# inverse binary image, to ensure text region is in white
# because contours are found for objects in white
th = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

Now there is a black border surrounding the original image. In th it becomes are white border. Since it is unwanted we will remove it using cv2.floodFill()

black = np.zeros([img.shape[0] + 2, img.shape[1] + 2], np.uint8)
mask = cv2.floodFill(th.copy(), black, (0,0), 0, 0, 0, flags=8)[1]

# dilation using horizontal kernel
kernel_length = 30
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_length, 1))
dilate = cv2.dilate(mask, horizontal_kernel, iterations=1)

img2 = img.copy()
contours = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
for c in contours:
  x, y, w, h = cv2.boundingRect(c)
  img2 = cv2.rectangle(img, (x, y), (x + w, y + h), (0,255,0), 2)

You can get the coordinates for each line from cv2.boundingRect(). This can be seen in the image above. Using those coordinates you can crop each line in the document and feed it to pytesseract library.

one whole rectangle instead of multiple rectangles. cv2.imshow*"letter', img2) — toyota Supra, Jun 09 '22 at 11:46
@toyotaSupra Thanks for spotting it. The change is made while applying threshold `cv2.THRESH_BINARY_INV` not `cv2.THRESH_BINARY` — Jeru Luke, Jun 09 '22 at 11:52

How to get coordinates of the overall bounding box of a text image?

1 Answers1

Linked