2

Having trouble with this error code regarding the following code for Pytesseract. (Python 3.6.1, Mac OSX)

import pytesseract import requests from PIL import Image from PIL import ImageFilter from io import StringIO, BytesIO

def process_image(url):
    image = _get_image(url)
    image.filter(ImageFilter.SHARPEN)
    return pytesseract.image_to_string(image)


def _get_image(url):
    r = requests.get(url)
    s = BytesIO(r.content)
    img = Image.open(s)
    return img

process_image("https://www.prepressure.com/images/fonts_sample_ocra_medium.png")

Error:

/usr/local/Cellar/python3/3.6.0_1/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/g/pyfo/reddit/ocr.py
Traceback (most recent call last):
  File "/Users/g/pyfo/reddit/ocr.py", line 20, in <module>
    process_image("https://www.prepressure.com/images/fonts_sample_ocra_medium.png")
  File "/Users/g/pyfo/reddit/ocr.py", line 10, in process_image
    image.filter(ImageFilter.SHARPEN)
  File "/usr/local/lib/python3.6/site-packages/PIL/Image.py", line 1094, in filter
    return self._new(filter.filter(self.im))
  File "/usr/local/lib/python3.6/site-packages/PIL/ImageFilter.py", line 53, in filter
    raise ValueError("cannot filter palette images")
ValueError: cannot filter palette images

Process finished with exit code 1

Seems simple enough, but is not working. Any help would be greatly appreciated.

gmonz
  • 252
  • 1
  • 5
  • 17
  • Possible duplicate of [Python3 error: initial\_value must be str or None](http://stackoverflow.com/questions/31064981/python3-error-initial-value-must-be-str-or-none) – Craig Apr 07 '17 at 01:38
  • @Craig I saw that one and the answers unfortunately did not solve my issue. I am using Python 3.6.1 btw. – gmonz Apr 07 '17 at 01:40
  • 1
    So you replaced `StringIO` with `BytesIO` and you get the same error message? If so, then break the `return Image.open(StringIO(requests.get(url).content))` into several separate lines (basic debugging) to find out exactly which call is throwing the error. – Craig Apr 07 '17 at 01:42
  • @Craig To be honest, I am not sure how to break that into several separate lines completely properly although it is for sure `Image.open(StringIO(requests.get(url).content))` – gmonz Apr 07 '17 at 01:52
  • 1
    `r = requests.get(url)` – Craig Apr 07 '17 at 01:53
  • 1
    `s = BytesIO(r.content)` <- this is from the [tutorial](http://docs.python-requests.org/en/master/user/quickstart/#binary-response-content) – Craig Apr 07 '17 at 01:53
  • 1
    `img = Image.open(s)` – Craig Apr 07 '17 at 01:54
  • 1
    `return img`. That should do it. – Craig Apr 07 '17 at 01:55
  • 1
    You need to have `from io import BytesIO` and then use `BytesIO()` not `StringIO()` to process the content from the request. Every error message shows that you are still using `StringIO()` – Craig Apr 07 '17 at 01:57
  • 1
    This `ValueError: cannot filter palette images` is a different error. That means that you are now using `BytesIO()` correctly and the error is occuring in the `Image.filter()` line. Edit your question to show only the code that produces this error and someone might be able to help. – Craig Apr 07 '17 at 02:01
  • man, thanks so much for being so kind about me not being the most knowledgable about this stuff. I really appreciate you breaking it down. do you happen to know what a "palette image" is? @Craig – gmonz Apr 07 '17 at 02:03
  • 1
    For the new error, I think you need to convert the image to 'RGB' as explained in this question: https://stackoverflow.com/questions/10323692/cannot-filter-palette-images-error-when-doing-a-imageenhance-sharpness – Craig Apr 07 '17 at 02:08
  • I know I am close. Just what to set image.enhance(2.0) to?https://pastebin.com/uRfhsi8J @Craig or perhaps I do need both filter and enhance so this: ? https://pastebin.com/GbUkqp9c ? but what to set sharpened to? – gmonz Apr 07 '17 at 02:23

1 Answers1

6

The image you have is a pallet-based image. You need to convert it to a full RGB image in order to use the PIL filters.

import pytesseract 
import requests 
from PIL import Image, ImageFilter 
from io import StringIO, BytesIO

def process_image(url):
    image = _get_image(url)
    image = image.convert('RGB')
    image = image.filter(ImageFilter.SHARPEN)
    return pytesseract.image_to_string(image)


def _get_image(url):
    r = requests.get(url)
    s = BytesIO(r.content)
    img = Image.open(s)
    return img

process_image("https://www.prepressure.com/images/fonts_sample_ocra_medium.png")

You should also note that the the .convert() and .filter() methods return a copy of the image, they don't change the existing image object. You need to assign the return value to a variable as shown in the code above.

NOTE: I don't have pytesseract, so I can't check the last line of process_image().

Craig
  • 4,605
  • 1
  • 18
  • 28
  • hmm now I am getting `/usr/local/Cellar/python3/3.6.0_1/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/g/pyfo/reddit/ocr2.py Traceback (most recent call last): File "/Users/g/pyfo/reddit/ocr2.py", line 19, in process_image("https://www.prepressure.com/images/fonts_sample_ocra_medium.png") File "/Users/g/pyfo/reddit/ocr2.py", line 10, in process_image return pytesseract.image_to_string(image) AttributeError: module 'pytesseract' has no attribute 'image_to_string'` – gmonz Apr 07 '17 at 02:28
  • I know I am close on the other one. Just what to set image.enhance(2.0) to?pastebin.com/uRfhsi8J @Craig or perhaps I do need both filter and enhance so this: ? pastebin.com/GbUkqp9c ? I need to definitely enhance before I sharpen and filter. – gmonz Apr 07 '17 at 02:29
  • I can't help you with pytesseract. I suggest that you keep experimenting with the code and if you are still stuck, post a new question directed at how to use pytesseract with your image. – Craig Apr 07 '17 at 02:37
  • AttributeError: 'PngImageFile' object has no attribute 'enhance' lol this is way more complicated than i thought (original with some debugging) – gmonz Apr 07 '17 at 02:37
  • well thanks a bunch for the help man, you should get it though its cool! (if it works lol) – gmonz Apr 07 '17 at 02:38