1

I have a extracted image as this extracted image, I want to crop and extract the individual letters from this image.

I have tried the below code, but it is working only for the names which are written like this name with gap between letters for this image I am getting expected result as single letter at a time.

import cv2
import numpy as np

img = cv2.imread('data1/NAME.png')

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh1 = cv2.threshold(gray,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

kernel = np.ones((3, 3), np.uint8)
imgMorph = cv2.erode(thresh1, kernel, iterations = 1)

contours, hierarchy = cv2.findContours(imgMorph,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)

i=1
for cnt in contours:
    x,y,w,h = cv2.boundingRect(cnt)

    if w>10 and w<100 and h>10 and h<100:
        #save individual images
        cv2.imwrite("data1/NAME_{}.png".format((i)),thresh1[y:y+h,x:x+w])
        i=i+1

cv2.imshow('BindingBox',imgMorph)
cv2.waitKey(0)
cv2.destroyAllWindows()

this code giving the below results result1 and

result2 and so on

expected result expected2, expected2 like this.

Neo_21995
  • 67
  • 7
  • An idea is to dilate with a horizontal kernel to connect the letters together. Then find contours and filter using a minimum threshold area. If this passes the filter then it must be text so you can save the ROI. This will work if the letters are not touching the box. A 2nd approach is to simply crop using an offset since you already have the bounding box for each extracted image. Just add a offset of say 15 to remove the boxes. Again this may have limitations since it will chop off some text if its connected to the box – nathancy Nov 27 '19 at 21:50
  • This image is not the one case the letters may outside the box or touches the textbox. My concern is to recognise the text written in that box for that i am trieng to segment letters then predict – Neo_21995 Nov 28 '19 at 04:18

2 Answers2

1

You cannot separate touching or overlapping letters with morphological operations when the common line is as thick as the rest of the letter.

You cannot segment the letters but you can recognize them using advanced OCR techniques like machine learning.

Read this http://www.how-ocr-works.com/OCR/word-character-segmentation.html

Piglet
  • 27,501
  • 3
  • 20
  • 43
1

It's not as simple as thresholding and detecting blobs. You'll need to train an OCR engin like Tesseract to detect handwritten characters.

Ziri
  • 718
  • 7
  • 16