1

This is my very first attempt at using Python. I normally use .NET, but to identify shapes in documents have turned to Python and OpenCV for image processing.

I am using OpenCV TemplateMatching (cv2.matchTemplate) to discover Regions of Interest (ROI) in my documents.

This works well. The template matches the ROI's and rectangles are placed, identifying the matches.

The ROI's in my images contain text which I also need to OCR and extract. I am trying to do this with Tesseract, but I think I am approaching it wrongly, based upon my results.

My process is this:

  • Run cv2.matchTemplate
  • Loop through matched ROI's
  • Add rectangle info. to image
  • Pass rectangle info. to Tesseract
  • Add text returned from tesseract to image
  • Write the final image

In the image below, you can see the matched regions (which are fine), but you can see that the text in the ROI doesn't match the text from tesseract (bottom right of ROI).

Please could someone take a look and advise where I am going wrong?

import cv2
import numpy as np
import pytesseract
import imutils

img_rgb = cv2.imread('images/pd2.png')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)

template = cv2.imread('images/matchMe.png', 0)
w, h = template.shape[::-1]

res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
threshold = 0.45
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]):
    cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)
    roi = img_rgb[pt, (pt[0] + w, pt[1] + h)]
    config = "-l eng --oem 1 --psm 7"
    text = pytesseract.image_to_string(roi, config=config)
    print(text)
    cv2.putText(img_rgb, text, (pt[0] + w, pt[1] + h),
                cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)

cv2.imwrite('images/results.png', img_rgb)
GoodJuJu
  • 1,296
  • 2
  • 16
  • 37
  • Can you add test images (document + template)? Your approach seems to be fine. While reading images you can always combine imread with grayscale with something like this img_gray = cv2.imread('images/pd2.png', cv2.IMREAD_GRAYSCALE) – Knight Forked May 21 '20 at 11:15
  • I will add them as soon as I get a moment. My code does convert to grayscale: img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY) – GoodJuJu May 21 '20 at 11:51
  • I just mentioned that as a side point (reading grayscale directly) to avoid using two lines of code instead of just one. (Y) – Knight Forked May 21 '20 at 11:53
  • Oh, I see, thanks. I will still use the colour image as the source image as I need to write out to the original with the matched ROI's. – GoodJuJu May 21 '20 at 12:05
  • I have added the source images to the question. – GoodJuJu May 21 '20 at 12:10

1 Answers1

1

There were two issues in your code: 1. You were modifying image (drawing rect) before OCR. 2. roi was not properly constructed.

img_rgb = cv2.imread('tess.png')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)

template = cv2.imread('matchMe.png', 0)
w, h = template.shape[::-1]

res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
threshold = 0.45
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]):
    roi = img_rgb[pt[1]:pt[1] + h, pt[0]: pt[0] + w]
    config = "-l eng --oem 1 --psm 7"
    text = pytesseract.image_to_string(roi, config=config)
    print(text)
    cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)
    cv2.putText(img_rgb, text, (pt[0] + w, pt[1] + h),
                cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)

cv2.imwrite('results.png', img_rgb)

You might still have to feed tesseract even properly filtered image for any meaningful recognition. Hope this helps.

Knight Forked
  • 1,529
  • 12
  • 14
  • Thanks for your answer, I am out at the moment and cannot check. You said the ROI was not properly constructed? However my code was placing the rectangle at the correct ROI? – GoodJuJu May 21 '20 at 14:52
  • I meant the image you were constructing for the ROI was not proper. You can just add imwrite in your code for the image ROI and see what you get. – Knight Forked May 21 '20 at 18:05
  • Hi, I just checked and it's close. All the top lines begin with 420 and then 2 or 3 alpha characters (e.g. 420FT), but I cannot find '420' in any of the results. It does seem to be returning the bottom (2nd) line though, which is 5 digits (e.g. 10610). – GoodJuJu May 21 '20 at 21:06
  • Well, as I said in my answer that, now you have to provide a clean image to tesseract (without the template - enclosing circle). That might improve recognition accuracy somewhat. You can do a test by just giving image having the characters that you would like to be identified and see how tesseract fares in that respect. That's the best you can get unless you do a little bit of preprocessing on the image. Please bear in mind that tesseract is not perfect, so there are quite a few limiting factors. But as it stands, I hope, your initial problem has been resolved by my answer. – Knight Forked May 22 '20 at 05:28
  • If you think my answer resolved your problem, please consider accepting the answer, it would motivate me to help others. – Knight Forked May 22 '20 at 05:30
  • Just wanted to mention that this is the first time I have used tesseract. For OCR I have been using Tensorflow and TFLite, having trained my own model. There's also an alternative in OpenCV I suppose, I have not tried it myself so cannot say anything about it. If you want control over accuracy Tensorflow is perhaps the way to go. But then it is proportionately more complicated. – Knight Forked May 22 '20 at 05:35
  • Were you able to resolve the other issues? Try setting psm to 1 or 3 and see if it works. – Knight Forked May 22 '20 at 15:34
  • Hi, yes, it worked in the sense that it is now extracting some text, not just one character, but I think I still have a way to go. Thank you for your help. – GoodJuJu May 22 '20 at 22:18
  • I had missed those settings. config = "-l eng --oem 3 --psm 6" seems to give reasonable results – GoodJuJu May 22 '20 at 22:29