Is there a way to make a script that automatically corrects scanned documents?

Question

I often scan handwritten documents to send to colleagues, and need to make corrections to the digital file once it's scanned. (For example, I change mistakes I made on the original document to white.)

I am thinking of some script which can do the following:

Take a color scan image (say a tiff) as input, and make simple corrections automatically based on colored corrections in the image.

For example take the simplest case: I write only black on white. There is an area where I made mistakes so I draw a red closed circle (with a pen on the actual sheet of paper) around that area. Then I scan the image (or usually many of them). Now I would like the script to erase each of these areas in all of the images so my mistakes disappear in the resulting image.

Any ideas how to realize this in a Linux environment, e.g. with Image Magick?

It looks like Gimp with script-fu could be the way to go it should be powerful enough. Can somebody give me a hint by pointing out the above example would look like in script-fu?

Also helpfull for me: which linux pixel image software is highly scriptable and supports complex opperations like masks from color selection etc. — highsciguy, Apr 20 '12 at 19:07
Since you are already manually marking what you want changed, have you thought about non-technical things like whiteout tape (http://www.amazon.com/Wite-Out-Correction-1-Line-Dispenser-BICWOTAPP11/dp/B003V8Q7HS) or using non-reflective blank labels or stickers to cover up the mistakes? — Christopher Bottoms, May 12 '12 at 12:09
I am a complicated person and this solution is too simple for me;) No, seriously: I am aware of this solution but I have some more ideas in my mind which cannot realized in this way if I want to produce nice handwritten text. E.g. I would like to be able to highlight text in the same way by changing its color. Or to draw a rectangular box around text. These things take a lot of time if I want to do it by hand. — highsciguy, May 14 '12 at 16:35
I don't usually recommend cross-posting, but I bet this would be a good question for http://photo.stackexchange.com? Be sure to include a link to this question there and vice-versa so that everyone knows it's cross posted. — Christopher Bottoms, May 14 '12 at 18:27
A solution that I'm thinking is: 1) Segment the object of interest based on the color; 2) Use Flood Fill algorithm to fill the segmented area with the desired color. I don't know Image Magick deeply, but I found that it has `floodfill` algorithm (search "flood" in http://www.imagemagick.org/Usage/draw/). It just needs the seed, which would be given by the segmentation. Are you open to OpenCV or Matlab solutions? — Yamaneko, Jul 20 '12 at 03:14
There is a powerful computer vision requirement that you seem to have not considered. How is the script going to recognize your copy-edits in order to carry them out? — phs, Jul 28 '12 at 01:40

Yamaneko · Answer 1 · 2012-08-29T03:15:23.810

I'm thinking in a solution based on ImageMagick. We would need the following steps:

Find the color used to draw in the scanned document (for now on, called target color);
Find its x and y coordinates in the image;
Pass this position as a seed to Flood Fill algorithm.

We could use the following script based on functions of ImageMagick:

Output all the unique colors in the picture. This will be used to find out which are the RGB components of the target color (command source).
```
convert <image> -unique-colors -depth 8 txt:- > output.txt
```
Output the coordinates of each color in a text file:
```
convert <image> txt:- > coord.txt
```
Find the x and y coordinates of the target color (command source). Suppose the target color obtained by step 1 was red:
```
grep red coord.txt
```
Finally, use x and y as a seed to floodfill to replace the circle region by your desired color (command source). In this case, I've used white to erase the region:
```
convert <image> -fill white -fuzz 13% \
        -draw 'color <x>,<y> floodfill' <image_floodfill_output>
```

The -fuzz parameter will avoid that colors which were originally red and became corrupted due to noise also gets replaced.

This tutorial gives more information about floodfill function, such as how to replace the edge colors.

+1 for this suggestion. Would be even better if you elaborated it a bit more... :-) — Kurt Pfeifle, Aug 14 '12 at 08:20
@KurtPfeifle thank you! :-) Which points does need improvement? Maybe an example of usage and further explanation about the commands used? — Yamaneko, Aug 14 '12 at 13:56
Exactly :-) And may even some pictures and text file(extract)s which demonstrate the effects of the commands you're using... — Kurt Pfeifle, Aug 14 '12 at 14:29

score 0 · Answer 2 · answered May 16 '12 at 20:51

I would suggest looking at a scansnap scanner (perhaps the scansnap 3100). There are several things that the bundled software can do that may be helpful.

You may find that any software / script that you find will not work the way you'd like. It sounds like many of these edits are things that need to be seen with a human eye. Perhaps you could hire a personal assistant to make these corrections for you. :)

Is there a way to make a script that automatically corrects scanned documents?

2 Answers2