Text recognition and detection using TensorFlow

Question

I a working on a text recognition project. I have built a classifier using TensorFlow to predict digits but I would like to implement a more complex algorithm of text recognition by using text localization and text segmentation (separating each character) but I didn't find an implementation for those parts of the algorithms.

So, do you know some algorithms/implementation/tips I, using TensorFlow, to localize text and do text segmentation in natural scenes pictures (actually localize and segmentation of text in the scoreboard for sports pictures)?

Thank you very much for any help.

this is an __extremely__ broad question and a braid answer would be yes. — parsethis, Mar 17 '17 at 23:30
I am personally toying with the idea of using mouse/touchscreen gesture recognizing algorithm for OCR. Did you do something similar? — Dalen, Mar 18 '17 at 03:51

score 1 · Answer 1 · answered Mar 18 '17 at 03:58

1

To group elements on a page, like paragraphs of text and images, you can use some clustering algo, and/or blob detection with some tresholds.

You can use Radon transform to recognize lines and detect skew of a scanned page.

I think that for character separation you will have to mess with fonts. Some polynomial matching/fitting or something. (this is a very wild guess for now, don't take it seriously). But similar aproach would allow you to get the character out of the line and recognize it in same step.

As for recognition, once you have a character, there is a nice trigonometric trick of comparing angles of the character to the angles stored in a database. Works great on handwriting too.

I am not an expert on how page segmentation exactly works, but it seems that I am on my way to become one. Just working on a project including it. So give me a month and I'll be able to tell you more. :D

Anyway, you should go and read Tesseract code to see how HP and Google did it there. It should give you pretty good ideas.

Good luck!

answered Mar 18 '17 at 03:58

Dalen

4,128
1
17
35

2

I didn't say you should do nothing and just wait for me to get around with splitting stupid pages in a month. I know there is a little bit more than nothing on the net in terms of exemplary code, but there are some good books on OCR out there and as I said, tesseract is GPL. I hope that you will end up helping me instead of the other way around. But no matter. I will have to deal with page segmentation sooner or later. Let see who will be first to solve the problem. – Dalen Mar 18 '17 at 18:17
@Dalen did you obtain any results from your project? I'm working on a project involving text detection and segmentation too. – SarahData Sep 21 '17 at 12:09
I am sorry, I got distracted by some others equally complicated projects. But I'll do it, as I said, sooner or later. I achieved some improvements of tesseract recognition by providing better prepared images, but page segmentation I didn't touch yet, not really. Except built-in one. My trouble is that I am trying to perform recognition on strips and graphic novels. It is a mess and completely crazy concept besides. All crazy demented cool fonts, backgrounds and stuff. It kills me. Did you try Radon and did you examine tesseract code? – Dalen Sep 21 '17 at 12:23
See, I have all these crazy speech balloons to extract and separate them from graphics, then filter out what is background and what text, then try recognition on a font that may be invented solely for that strip/book. I'm obviously a loony to think I can do it alone. But I'll try anyway. :D – Dalen Sep 21 '17 at 12:28
@SarahM : P.S. Whoops, I forgot to mention you in comments above, so this one just to be sure you get them. – Dalen Sep 21 '17 at 12:46

score 0 · Answer 2 · answered Nov 22 '22 at 05:35

After you are done with Object Detection, you can perform text detection which can be passed on to tesseract. There can multiple variation to enhance image before passing it to detector function.

Reference Papers https://arxiv.org/abs/1704.03155v2 https://arxiv.org/pdf/2002.07662.pdf

def text_detector(image):
#hasFrame, image = cap.read()
orig = image
(H, W) = image.shape[:2]

(newW, newH) = (640, 320)
rW = W / float(newW)
rH = H / float(newH)

image = cv2.resize(image, (newW, newH))
(H, W) = image.shape[:2]

layerNames = [
    "feature_fusion/Conv_7/Sigmoid",
    "feature_fusion/concat_3"]


blob = cv2.dnn.blobFromImage(image, 1.0, (W, H),
    (123.68, 116.78, 103.94), swapRB=True, crop=False)

net.setInput(blob)
(scores, geometry) = net.forward(layerNames)

(numRows, numCols) = scores.shape[2:4]
rects = []
confidences = []

for y in range(0, numRows):

    scoresData = scores[0, 0, y]
    xData0 = geometry[0, 0, y]
    xData1 = geometry[0, 1, y]
    xData2 = geometry[0, 2, y]
    xData3 = geometry[0, 3, y]
    anglesData = geometry[0, 4, y]

    # loop over the number of columns
    for x in range(0, numCols):
        # if our score does not have sufficient probability, ignore it
        if scoresData[x] < 0.5:
            continue

        # compute the offset factor as our resulting feature maps will
        # be 4x smaller than the input image
        (offsetX, offsetY) = (x * 4.0, y * 4.0)

        # extract the rotation angle for the prediction and then
        # compute the sin and cosine
        angle = anglesData[x]
        cos = np.cos(angle)
        sin = np.sin(angle)

        # use the geometry volume to derive the width and height of
        # the bounding box
        h = xData0[x] + xData2[x]
        w = xData1[x] + xData3[x]

        # compute both the starting and ending (x, y)-coordinates for
        # the text prediction bounding box
        endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
        endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
        startX = int(endX - w)
        startY = int(endY - h)

        # add the bounding box coordinates and probability score to
        # our respective lists
        rects.append((startX, startY, endX, endY))
        confidences.append(scoresData[x])

boxes = non_max_suppression(np.array(rects), probs=confidences)

for (startX, startY, endX, endY) in boxes:

    startX = int(startX * rW)
    startY = int(startY * rH)
    endX = int(endX * rW)
    endY = int(endY * rH)

    # draw the bounding box on the image
    cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 3)
return orig

Text recognition and detection using TensorFlow

2 Answers2