0

I am trying to improve the accuracy of an OCR I wrote. It performs well for a normal image but struggles for a noisy image.

The noisy image:

I wrote a function to remove the noise and it does remove a lot of the noise present but also diminishes the text a bit. I am only able to capture around 60% of the text. I tried adjusting the contrast, sharpness and threshold but not able to improve OCR performance.

import cv2
import pytesseract
import numpy as np

def noise_remove(image):
    kernel = np.ones((1,1), np.uint8)
    image = cv2.dilate(image, kernel, iterations=1)
    kernel = np.ones((1,1), np.uint8)
    image = cv2.erode(image, kernel, iterations=1)
    image = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel)
    image = cv2.medianBlur(image, 3)
    return image

img = cv2.imread('2.jpg')
img  = cv2.resize(img, None, fx = 0.8, fy = 0.8)
blurImg = noise_remove(img)
hImg, wImg, _ = img.shape
text = pytesseract.image_to_string(blurImg)
print(text)
cv2.waitKey(0)

The output I get:

Result:

Little: afr aid its eat looked now. iy ye lady girl them good me make. It hardly cousin ime always. fin shortiy village is raising we sheiting replied. She the ~ tavourabdle partiality inhabiting travelling impression pub luo. His six are entreaties instrument acceptance unsatiable her. Athongs} as or on herself chapter ertered carried no Sold oid ten are quit lose deal his sent. You correct how sex several far distant believe journey parties. We shyniss enquire uncivil attied if carried to A

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36

0 Answers0