Unable to extract a word out of an image

Question

I've written a script in python in combination with pytesseract to extract a word out of an image. There is only a single word TOOLS available in that image and that is what I'm after. Currently my below script is giving me wrong output which is WIS. What Can I do to get the text?

Link to that image

This is my script:

import requests, io, pytesseract
from PIL import Image

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))
img = img.resize([100,100], Image.ANTIALIAS)
img = img.convert('L')
img = img.point(lambda x: 0 if x < 170 else 255)
imagetext = pytesseract.image_to_string(img)
print(imagetext)
# img.show()

This is the status of the modified image when I run the above script:

The output I'm having:

WIS

Expected output:

TOOLS

If OCR were that simple... – Benjamin Toueg Jun 20 '18 at 14:17 — Benjamin Toueg, Jun 20 '18 at 14:17

igrinis · Accepted Answer · 2018-06-25T01:57:37.313

12

The key is matching image transformation to the tesseract abilities. Your main problem is that the font is not a usual one. All you need is

from PIL import Image, ImageEnhance, ImageFilter

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))

# remove texture
enhancer = ImageEnhance.Color(img)
img = enhancer.enhance(0)   # decolorize
img = img.point(lambda x: 0 if x < 250 else 255) # set threshold
img = img.resize([300, 100], Image.LANCZOS) # resize to remove noise
img = img.point(lambda x: 0 if x < 250 else 255) # get rid of remains of noise
# adjust font weight
img = img.filter(ImageFilter.MaxFilter(11)) # lighten the font ;)
imagetext = pytesseract.image_to_string(img)
print(imagetext)

And voila,

TOOLS

are recognized.

edited Jun 25 '18 at 01:57

answered Jun 23 '18 at 14:18

igrinis

12,398
20
45

Looks like `img = img.filter(ImageFilter.MaxFilter(11))` is the key :) – Benjamin Toueg Jun 25 '18 at 09:40
Can you elaborate on the difference between `img.convert('L')` and `ImageEnhance.Color(img).enhance(0)` ? And if there is any best practice in terms of ordering of instructions? – Benjamin Toueg Jun 25 '18 at 09:43
1

1) What `MaxFilter` does is basically morphological erosion. 2) The difference is mostly conceptual. `.convert('L')` transform colors to gray-level, `Color(img).enhance(0)` removes the hue. 3) The order of instructions follows the logic of processing, that is remove pattern from the letters, convert to B&W image, adjust font weight and send it to `tesseract`. If the background wasn't white, I'd play with color channels and would try other approaches, detecting long edges probably. Since it is a single image, I just threw in something that did the job and was somehow robust. – igrinis Jun 25 '18 at 10:38

Benjamin Toueg · Answer 2 · 2018-06-22T14:27:19.193

The key issue with your implementation lies here:

img = img.resize([100,100], Image.ANTIALIAS)
img = img.point(lambda x: 0 if x < 170 else 255)

You could try different sizes and different threshold:

import requests, io, pytesseract
from PIL import Image
from PIL import ImageFilter

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))
filters = [
    # ('nearest', Image.NEAREST),
    ('box', Image.BOX),
    # ('bilinear', Image.BILINEAR),
    # ('hamming', Image.HAMMING),
    # ('bicubic', Image.BICUBIC),
    ('lanczos', Image.LANCZOS),
]

subtle_filters = [
    # 'BLUR',
    # 'CONTOUR',
    'DETAIL',
    'EDGE_ENHANCE',
    'EDGE_ENHANCE_MORE',
    # 'EMBOSS',
    'FIND_EDGES',
    'SHARPEN',
    'SMOOTH',
    'SMOOTH_MORE',
]

for name, filt in filters:
    for subtle_filter_name in subtle_filters:
        for s in range(220, 250, 10):
            for threshold in range(250, 253, 1):
                img_temp = img.copy()
                img_temp.thumbnail([s,s], filt)
                img_temp = img_temp.convert('L')
                img_temp = img_temp.point(lambda x: 0 if x < threshold else 255)
                img_temp = img_temp.filter(getattr(ImageFilter, subtle_filter_name))
                imagetext = pytesseract.image_to_string(img_temp)
                print(s, threshold, name, subtle_filter_name, imagetext)
                with open('thumb%s_%s_%s_%s.jpg' % (s, threshold, name, subtle_filter_name), 'wb') as g:
                    img_temp.save(g)

and see what works for you.

I would suggest you resize your image while keeping the original ratio. You could also try some alternative to img_temp.convert('L')

Best so far: TWls and T0018

You can try to manipulate the image manually and see if you can find some edit that can provide a better output (for instance http://gimpchat.com/viewtopic.php?f=8&t=1193)

By knowing in advance the font you could probably achieve a better result too.

Unable to extract a word out of an image

2 Answers2