I am trying to implement character extraction from images in Python using the MSER
in opencv
. This is my code till now:
import cv2
import numpy as np
# create MSER object
mser = cv2.MSER_create()
# convert image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# detect the regions
regions,_ = mser.detectRegions(gray)
# find convex hulls of the regions
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]
# initialize threshold area of the contours
ThresholdContourArea = 10000
# initialize empty list for the characters and their locations
char = []
loc =[]
# get the character part of the image and it's location if the area of contour less than threshold
for contour in hulls:
if cv2.contourArea(contour) > ThresholdContourArea:
continue
# get the bounding rectangle around the contour
bound_rect = cv2.boundingRect(contour)
loc.append(bound_rect)
det_char = gray[bound_rect[1]:bound_rect[1]+bound_rect[3],bound_rect[0]:bound_rect[0]+bound_rect[2]]
char.append(det_char)
But this method gives multiple contours for the same letter and at some places multiple words are put into one contour. Here is an eg: original image:
After adding the contours:
Here the first T has multiple contours around and the two rs are combined into one contour. How do I prevent that?