How do I reduce a scanned image to a consistent hash?

Question

I'd like to be able to scan an image and reduce it to a consistent hash that I can subsequently compare to a new scan to see if the two images are the same.

Any help in this regard would be greatly appreciated!

Are you looking for comparison of two images (= fuzzy compare) or two files (=byte by byte compare)? — Alexei Levenkov, Aug 16 '12 at 21:33
@AlexeiLevenkov I don't see how it could be anything *but* a fuzzy compare. I guarantee that one page scanned twice will result in very different files at the byte-level. — Daniel Mann, Aug 16 '12 at 21:57
(more or less a) Duplicate of http://stackoverflow.com/questions/11931960/quickly-calculating-the-dirtied-areas-between-two-similar-images — PhonicUK, Aug 16 '12 at 22:23
@DanielMann, I agree... but maybe "new scan" defined as "someone gives me a scan - need to check if it is exact duplicate (i.e. retry to send)" which could be answered in reasonable time here. Otherwise - fun research project which is likely beyond SO scope. — Alexei Levenkov, Aug 16 '12 at 23:27
Does it need to be a hash? It seems to me that any fuzzy "hash" will have to preserve pixel values and relative location, which makes this "hash" sound a lot like an image! — Simon MᶜKenzie, Aug 17 '12 at 01:58
To answer Alexei I need to do a Fuzzy compare. Happily, I expect both of the images to be extremely similar. Yes, I'll be scanning both but I expect to do each scan with a similar scanner, at the same resultion. Basically, I want to very that Scan A (excepting dirt and scanner differences) is basically the same picture — Louis S. Berman, Aug 17 '12 at 16:57
In terms of "Does it need to be a "hash" the answer is yes! I need an exact match and I can't rely upon probabilites like Scan A and Scan B are 98% the same... — Louis S. Berman, Aug 17 '12 at 16:58

score 1 · Answer 1 · answered Aug 17 '12 at 00:26

The following approaches are possibly more powerful than what you actually need.

In computer vision, an active area of research is in recognition.

For instance, if I were to build a cleaning robot for my house, it should be able to recognize my dog (so as to not spray lethal chemicals on it). This is made more difficult since the robot won't necessarily look at the dog from the same perspective every time (and it can move). i.e. it should recognize it is my dog from the sides, the front, or the back.

To train this robot, I show it a few pictures of my dog under different lighting conditions, and it should be able to recognize it in the future.

Different approaches are in use to extract the salient features from an image which can help you recognize the same features even if the picture was taken in different lighting or from a different angle.

Some feature extraction techniques include the following:

SIFT (Scale-Invariant Feature Transform)
GIST (GIST descriptor)
HoG (Histogram of Oriented Gradients)
Shape Context
Texton
Spin-Images

However, rather than manually extracting features, many modern systems use a neural-network machine learning method so the robot/computer can learn to recognize objects, using perhaps the same way humans learn.

I've never done image recognition, so I am not sure about their advantages/disadvantages, but I found the subject fascinating, and I hope that computers will get better at recognizing stuff (vision, voice, gesture, etc).

Thanks for your helpful reply! I'm really looking for some code, though, so hopefully someone else will weigh in. — Louis S. Berman, Aug 17 '12 at 16:54

How do I reduce a scanned image to a consistent hash?

1 Answers1