6

I am trying to detect and grab text from a screenshot taken from any consumer product's ad.

My code works at a certain accuracy but fails to make bounding boxes around the skewed text area.

Recently I tried Google Vision API and it makes bounding boxes around almost every possible text area and detects text in that area with great accuracy. I am curious about how can I achieve the same or similar!

My test image:

enter image description here

Google Vision API after bounding boxes:

enter image description here

Thank you in advance:)

Bahramdun Adil
  • 5,907
  • 7
  • 35
  • 68
Tathya Kapadia
  • 67
  • 1
  • 1
  • 9
  • You can go with this tutorial: https://www.learnopencv.com/deep-learning-based-text-detection-using-opencv-c-python/ – Bahramdun Adil Feb 22 '19 at 07:30
  • I know I can't achieve the same in just one clap! I want to know what is the logic behind, maybe the name of any profound algorithm. – Tathya Kapadia Feb 22 '19 at 07:45
  • @TathyaKapadia there is no such profound algorithm. All the ML techniques for text detection are well know. Any random Joe Shmoe can write a deep learning text detection algorithm. The success of it is entirely dependent on the nuances of the implementation. People get PhD's to understand how to adjust the parameters of these models, and how to cascade an ensemble of different models to achieve a good results. It comes with years of research and experience. – darksky Feb 22 '19 at 08:44
  • If you just want something to impress your friends, literally googling, "python text detection image" will land you tutorials such as [this one](https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/) which incidentally was the first hit. – darksky Feb 22 '19 at 08:44
  • ImageMagick does not have text recognition or detection or OCR. But if you can somehow make a mask that contains just the text you want, then you can get the rotated bounding box in ImageMagick 7.0.10.2 or higher. See https://imagemagick.org/script/convex-hull.php#box – fmw42 May 28 '20 at 04:24

2 Answers2

15

There are a few open source vision packages that are able to detect text in noisy background images, comparable to Google's Vision API.

You can use a Fixed Convolution Layer simple architecture called EAST (Efficient and Accurate Scene Text Detector) by Zhou et al. https://arxiv.org/abs/1704.03155v2

Using Python:

Download the Pre-trained model from: https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1 . Extract the model to your current folder.

You will need OpenCV >= 3.4.2 to execute the below commands.

import cv2
import math
net = cv2.dnn.readNet("frozen_east_text_detection.pb")   #This is the model we get after extraction
frame = cv2.imread(<image_filename>)
inpWidth = inpHeight = 320  # A default dimension
# Preparing a blob to pass the image through the neural network
# Subtracting mean values used while training the model.
image_blob = cv2.dnn.blobFromImage(frame, 1.0, (inpWidth, inpHeight), (123.68, 116.78, 103.94), True, False)

Now we will have to define the output layers which churns out the positional values of the detected text and its confidence Score (through the Sigmoid Function)

output_layer = []
output_layer.append("feature_fusion/Conv_7/Sigmoid")
output_layer.append("feature_fusion/concat_3")

Finally we will do a Forward Propagation through the network to get the desired output.

net.setInput(image_blob)
output = net.forward(output_layer)
scores = output[0]
geometry = output[1]

Here i have used the decode function defined in opencv's github page, https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.py to convert the positional values into box coordinates. (line 23 to 75).

For box detection threshold i have used a value of 0.5 and for Non Max Suppression i have used 0.3. You can try different values to achieve better bounding boxes.

confThreshold = 0.5
nmsThreshold = 0.3
[boxes, confidences] = decode(scores, geometry, confThreshold)
indices = cv2.dnn.NMSBoxesRotated(boxes, confidences, confThreshold, nmsThreshold)

Lastly, to overlay the boxes over the detected text in image:

height_ = frame.shape[0]
width_ = frame.shape[1]
rW = width_ / float(inpWidth)
rH = height_ / float(inpHeight)

for i in indices:
    # get 4 corners of the rotated rect
    vertices = cv2.boxPoints(boxes[i[0]])
    # scale the bounding box coordinates based on the respective ratios
    for j in range(4):
        vertices[j][0] *= rW
        vertices[j][1] *= rH
    for j in range(4):
        p1 = (vertices[j][0], vertices[j][1])
        p2 = (vertices[(j + 1) % 4][0], vertices[(j + 1) % 4][1])
        cv2.line(frame, p1, p2, (0, 255, 0), 3)

# To save the image:
cv2.imwrite("maggi_boxed.jpg", frame)

Maggi's Ad with bounding boxes

I have not experimented with different values of threshold. Changing them will surely give better result and also remove the misclassifications of the logo as text.

Note: The model was trained on English corpus, so Hindi words will not be detected. Also you can read the paper which outlines the test datasets it was bench-marked on.

Fleron-X
  • 232
  • 2
  • 11
1

You need to check if any of the libraries provide co-ordinate to text and then you can draw box around text. OCR libraries

1) Python pyocr and tesseract ocr over python

2) Using R language ( Extracting Text from PDFs; Doing OCR; all within R )

3) Tesseract library in Java/Pyspark

4) Apache Tika

5) Python - OpenCV - OCR of Hand-written Data using kNN

6)You can do the same by OpenCV and Python.

Free OCR Softwares

Google's & HP's Tesseract Google's Keep Microsoft Document Imaging ( MODI ) ( assuming majority of us would be having a windows OS ) Microsoft One Note Microsoft Oxford Project API ( This API is free until some time ) FreeOCR ( This is based on Tesseract engine again ) There are lot more but these are the best and out of all these, if you are looking for accuracy , Microsoft Document Imaging does better job. And if you are looking for hand written text ocr conversion then Google's Keep does better job.

Commercial Products

Adobe Acrobat Pro ( RTF file format gives you best result ) Captiva Abbyy Informatica ( Not sure which module within Informatica ) IBM Datacapture (Datacap) (IBM Watson) If accuracy is only your main constraint, there is something like Unprecedented Data Access at your Service( captricity ) which boasts of 99% accuracy since they crowd source people and make them convert hand written text without compromising security.