Using wand to reduce image filesize for improved OCR performance?

Question

I'm trying to write use the wand simple MagickWand API binding for Python to extract pages from a PDF, stitch them together into a single longer ("taller") image, and pass that image to Google Cloud Vision for OCR Text Detection. I keep running up against Google Cloud Vision's 10MB filesize limit.

I thought a good way to get the filesize down might be to eliminate all color channels and just feed Google a B&W image. I figured out how to get grayscale, but how can I make my color image into a B&W ("bilevel") one? I'm also open to other suggestions for getting the filesize down. Thanks in advance!

from wand.image import Image

selected_pages = [0,1]

imageFromPdf = Image(filename=pdf_filepath+str(selected_pages), resolution=600)
pages = len(imageFromPdf.sequence)

image = Image(
    width=imageFromPdf.width,
    height=imageFromPdf.height * pages
    )
for i in range(pages):
    image.composite(
    imageFromPdf.sequence[i],
    top=imageFromPdf.height * i,
    left=0
    )

image.colorspace = 'gray' 
image.alpha_channel = False
image.format = 'png'

image

You can call one of the threshold methods to make it bilevel. — fmw42, Dec 31 '19 at 18:37
A resolution of `600` would be a massive raster of over 30M pixels (assuming US letter paper size). You may need to drop resolution down to 120, and then call `Image.transform_colorspace('gray')` — emcconville, Dec 31 '19 at 18:59

fmw42 · Answer 1 · 2020-01-02T22:31:49.243

The following are several methods of getting a bilevel output from Python Wand (0.5.7). The last needs IM 7 to work. One note in my testing is that in IM 7, the first two results are swapped in terms of dithering or not dithering. But I have reported this to the Python Wand developer.

Input:

from wand.image import Image
from wand.display import display

# Using Wand 0.5.7, all images are not dithered in IM 6 and all images are dithered in IM 7
with Image(filename='lena.jpg') as img:
    with img.clone() as img_copy1:
        img_copy1.quantize(number_colors=2, colorspace_type='gray', treedepth=0, dither=False, measure_error=False)
        img_copy1.auto_level()
        img_copy1.save(filename='lena_monochrome_no_dither.jpg')
        display(img_copy1)
        with img.clone() as img_copy2:
            img_copy2.quantize(number_colors=2, colorspace_type='gray', treedepth=0, dither=True, measure_error=False)
            img_copy2.auto_level()
            img_copy2.save(filename='lena_monochrome_dither.jpg')
            display(img_copy2)
            with img.clone() as img_copy3:
                img_copy3.threshold(threshold=0.5)
                img_copy3.save(filename='lena_threshold.jpg')
                display(img_copy3)
                # only works in IM 7
                with img.clone() as img_copy4:
                    img_copy4.auto_threshold(method='otsu')
                    img_copy4.save(filename='lena_threshold_otsu.jpg')
                    display(img_copy4)

First output using IM 6:

Second output using IM 7:

This was very helpful! Thanks, fmw42! – Brian Jan 13 '20 at 16:39 — Brian, Jan 13 '20 at 16:39

Using **wand** to reduce image filesize for improved OCR performance?

1 Answers1

Using wand to reduce image filesize for improved OCR performance?