How to extract numbers from image using OpenCV and pytesseract image_to_string()?

Question

I'm trying to extract the numbers from an image using OpenCV and the image_to_string() method from pytesseract, but the output is not good.

I tried some pre-processing methods like resize and noise filters, but still can't get accurate results. How can I handle this?

At least you aren’t trying this using JPEG input images. Can you greatly improve the source image quality? — DisappointedByUnaccountableMod, Aug 16 '19 at 22:09

score 3 · Accepted Answer · answered Aug 16 '19 at 20:04

Here's a simple preprocessing step to clean up the image before using pytesseract

Convert image to grayscale
Sharpen the image
Perform morphological transformations to enhance text

Since your input image looks blurry, we can sharpen the image using cv2.filter2D() and a generic sharpening kernel. Other types of kernels can be found here

image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
sharpen = cv2.filter2D(gray, -1, sharpen_kernel)

The text has small holes, so we can use cv2.dilate() to close small holes and smooth the image

sharpen = 255 - sharpen
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
dilate = cv2.dilate(sharpen, kernel, iterations=1)
result = 255 - dilate

Here's the result. You can try using just the sharpened image or the enhanced image with pytesseract

import cv2
import numpy as np

image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
sharpen = cv2.filter2D(gray, -1, sharpen_kernel)

cv2.imwrite('sharpen.png', sharpen)
sharpen = 255 - sharpen
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
dilate = cv2.dilate(sharpen, kernel, iterations=1)

result = 255 - dilate
cv2.imwrite('result.png', result)
cv2.waitKey(0)

score 0 · Answer 2 · answered Aug 16 '19 at 21:08

I tried sharpening the image; however, I didn't notice any improvement in number extraction with tesseract. My advice is to first use a deep learning-based super-resolution method to improve the image like this and use tesseract for number extraction.

How to extract numbers from image using OpenCV and pytesseract image_to_string()?

2 Answers2