4

I've written a script in python in combination with pytesseract to extract a word out of an image. There is only a single word TOOLS available in that image and that is what I'm after. Currently my below script is giving me wrong output which is WIS. What Can I do to get the text?

Link to that image

This is my script:

import requests, io, pytesseract
from PIL import Image

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))
img = img.resize([100,100], Image.ANTIALIAS)
img = img.convert('L')
img = img.point(lambda x: 0 if x < 170 else 255)
imagetext = pytesseract.image_to_string(img)
print(imagetext)
# img.show()

This is the status of the modified image when I run the above script:

enter image description here

The output I'm having:

WIS

Expected output:

TOOLS
SIM
  • 21,997
  • 5
  • 37
  • 109

2 Answers2

12

The key is matching image transformation to the tesseract abilities. Your main problem is that the font is not a usual one. All you need is

from PIL import Image, ImageEnhance, ImageFilter

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))

# remove texture
enhancer = ImageEnhance.Color(img)
img = enhancer.enhance(0)   # decolorize
img = img.point(lambda x: 0 if x < 250 else 255) # set threshold
img = img.resize([300, 100], Image.LANCZOS) # resize to remove noise
img = img.point(lambda x: 0 if x < 250 else 255) # get rid of remains of noise
# adjust font weight
img = img.filter(ImageFilter.MaxFilter(11)) # lighten the font ;)
imagetext = pytesseract.image_to_string(img)
print(imagetext)

And voila,

TOOLS

are recognized.

igrinis
  • 12,398
  • 20
  • 45
  • Looks like `img = img.filter(ImageFilter.MaxFilter(11))` is the key :) – Benjamin Toueg Jun 25 '18 at 09:40
  • Can you elaborate on the difference between `img.convert('L')` and `ImageEnhance.Color(img).enhance(0)` ? And if there is any best practice in terms of ordering of instructions? – Benjamin Toueg Jun 25 '18 at 09:43
  • 1
    1) What `MaxFilter` does is basically morphological erosion. 2) The difference is mostly conceptual. `.convert('L')` transform colors to gray-level, `Color(img).enhance(0)` removes the hue. 3) The order of instructions follows the logic of processing, that is remove pattern from the letters, convert to B&W image, adjust font weight and send it to `tesseract`. If the background wasn't white, I'd play with color channels and would try other approaches, detecting long edges probably. Since it is a single image, I just threw in something that did the job and was somehow robust. – igrinis Jun 25 '18 at 10:38
0

The key issue with your implementation lies here:

img = img.resize([100,100], Image.ANTIALIAS)
img = img.point(lambda x: 0 if x < 170 else 255)

You could try different sizes and different threshold:

import requests, io, pytesseract
from PIL import Image
from PIL import ImageFilter

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))
filters = [
    # ('nearest', Image.NEAREST),
    ('box', Image.BOX),
    # ('bilinear', Image.BILINEAR),
    # ('hamming', Image.HAMMING),
    # ('bicubic', Image.BICUBIC),
    ('lanczos', Image.LANCZOS),
]

subtle_filters = [
    # 'BLUR',
    # 'CONTOUR',
    'DETAIL',
    'EDGE_ENHANCE',
    'EDGE_ENHANCE_MORE',
    # 'EMBOSS',
    'FIND_EDGES',
    'SHARPEN',
    'SMOOTH',
    'SMOOTH_MORE',
]

for name, filt in filters:
    for subtle_filter_name in subtle_filters:
        for s in range(220, 250, 10):
            for threshold in range(250, 253, 1):
                img_temp = img.copy()
                img_temp.thumbnail([s,s], filt)
                img_temp = img_temp.convert('L')
                img_temp = img_temp.point(lambda x: 0 if x < threshold else 255)
                img_temp = img_temp.filter(getattr(ImageFilter, subtle_filter_name))
                imagetext = pytesseract.image_to_string(img_temp)
                print(s, threshold, name, subtle_filter_name, imagetext)
                with open('thumb%s_%s_%s_%s.jpg' % (s, threshold, name, subtle_filter_name), 'wb') as g:
                    img_temp.save(g)

and see what works for you.

I would suggest you resize your image while keeping the original ratio. You could also try some alternative to img_temp.convert('L')

Best so far: TWls and T0018

You can try to manipulate the image manually and see if you can find some edit that can provide a better output (for instance http://gimpchat.com/viewtopic.php?f=8&t=1193)

By knowing in advance the font you could probably achieve a better result too.

Benjamin Toueg
  • 10,511
  • 7
  • 48
  • 79