0

I am trying to detect and grab text from a scanned document which has been converted to an image which I then plan on performing OCR on to get specific handwritten portions. The sample document I am using is below as it a good representation of what I need to scan.

enter image description here

I am trying to create something similar to Google's Vision API approach to this, specifically just drawing boxes around places where text appears:

enter image description here

Currently, I am using frozen EAST text detection and cv2 to detect text and draw said boxes. The results have been suboptimal and text boxes are drawn in some, but not all the important locations.

enter image description here

My code is as follow. It is based on this post: How to make bounding box around text-areas in an image? (Even if text is skewed!!) which tries to draw bounded boxes around advertisements. The decode function is from https://github.com/opencv/opencv/blob/0e40c8a03158e599193d26635fcc13f110c22896/samples/dnn/text_detection.py

net = cv2.dnn.readNet(content + model_loc)   #This is the model we get after extraction
inpWidth = 320
inpHeight = 320  # A default dimension

# Preparing a blob to pass the image through the neural network
# Subtracting mean values used while training the model.
image_blob = cv2.dnn.blobFromImage(frame, 1.0, (320, 320), (200, 200, 200), True, False)

output_layer = []
output_layer.append("feature_fusion/Conv_7/Sigmoid")
output_layer.append("feature_fusion/concat_3")

print(image_blob.shape)
net.setInput(image_blob)
output = net.forward(output_layer)
scores = output[0]
geometry = output[1]

frame = np.array(img)
confThreshold = 0.01
nmsThreshold = 0.3
[boxes, confidences] = decode(scores, geometry, confThreshold)
indices = cv2.dnn.NMSBoxesRotated(boxes, confidences, confThreshold, nmsThreshold)

height_ = frame.shape[0]
width_ = frame.shape[1]
rW = width_ / float(inpWidth)
rH = height_ / float(inpHeight)
for i in indices:
    # get 4 corners of the rotated rect
    vertices = cv2.boxPoints(boxes[i])
    # scale the bounding box coordinates based on the respective ratios
    for j in range(4):
        vertices[j][0] *= rW
        vertices[j][1] *= rH
    for j in range(4):
        p1 = (int(vertices[j][0]), int(vertices[j][1]))
        p2 = (int(vertices[(j + 1) % 4][0]), int(vertices[(j + 1) % 4][1]))
        cv2.line(frame, p1, p2, (0, 255, 0), 3)

# To save the image:
Image.fromarray(frame)

Another option I have considered is to remove the background lines of the image so that I am left with just the text. This has proven a bit tricky considering the thickness of the lines and the blurryness of the text. If it is possible to do this as well, I will likely consider going down this path.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Geneku2
  • 65
  • 8

0 Answers0