Automatic extract subimage with a frame

Question

I am trying to extract a subimage from a scanned paper like this:

https://cloud.kopa.ch/index.php/s/gGZm5xeMYlPfU81

The extracted images should be georeferenced and added to a webmap service, but thats not the question here.

How can I get the frame / its pixel coordinates to crop the image? I am also free in creating the "layout" (similar to the example), which means I could add markers to get the frame better after scanning it again.

The workflow is: generate layout - print map - draw on the map - scan it - crop "map-frame" - georeferencing this frame - show it on a webmap

The "map-frames" are preprocessed and I know their location/extent Has anybody an idea how to crop the (scanned) images automatically to this "map-frame"?

I have to work with python and have the packages PIL, pillow and imagemagick for the image processing

Thanks for you help! If you need more information, don't hesitate to ask

Simon · Answer 1 · 2017-10-29T17:50:47.333

Here's an example I adapted form the Pillow docs, check them out for any further processing that you might need to perform:

from Pillow import Image

Image.open("/path/to/image.jpg")
box = (100, 100, 400, 400)
region = im.crop(box)

Also, it might prove valuable to search Stack Overflow for this kind of operation, I'm sure it has been discussed earlier.

As for finding the actual rectangle to crop you'll have to do some form of image analysis. In it's simplest form, conceptually that could be something along these lines:

Applying an S-curve filter to a black-and-white representation of your image
Iterate over all of the pixels in the image
Keep track of horizontal and vertical lines that has sufficiently black pixel values.
Use this data to determine the bounding box of the portion of the image your interested in.

Depending on your needs you might want to look into some computer vision library instead, which are well optimized for this and similar tasks. The one that springs to mind is OpenCV which is I would guess is well optimized and documented, and there's a python module available as well.

Hey Simon Thanks for you quick answer. The question is not about the cropping itself (that's no problem) but about finding the "100,100,400,400" for each image. They aren't fix because of the printing/scanning procedure. — Marco, Oct 30 '15 at 13:37
Oh, I'm sorry I misunderstood. I guess then your question pertains more to computer vision, then the actual Pillow library usage. I'll update my answer. — Simon, Oct 30 '15 at 14:13

Automatic extract subimage with a frame

1 Answers1