4

I hope someone will be able to help me.

I have pairs of black and white images that resulted from scanning texts with a large scanner (resulting files are up 500M). The texts scanned are nearly identical, and I need to check if there are any substantial differences.

Obviously I can not compare pixel by pixel since the same image scanned into a bmp will give me a slightly different result every time I scan.

Does anyone know of any library - open source or commertial - that I can buy or download, and build a .NET application around it.

Thank you in advance for your help. Helen.

user819490
  • 125
  • 1
  • 11
  • Could you provide a bit more detail about what you consider to be a "substantial" difference? Noise from the scanner, image rotation/scaling, scrunched up paper, etc? It's hard to do a meaningful comparison without deciding on a precise measure of similarity. – mpenkov Nov 05 '11 at 07:43

2 Answers2

6

Use perceptive hashing. It checks if two images are similar.

You can also compute feature descriptor using one of the many algorithms available in open cv and just compare the vector distances. Consider images as same if the distanced is below some threshold.

You can try GIST, SURF, SIFT, etc. (Some are scale and rotation invariant also).

Muhammad Hasan Khan
  • 34,648
  • 16
  • 88
  • 131
  • Hasan, I do not want to re-invent a wheel. What I am trying to do is to find a good library that will do the comparison for me. – user819490 Nov 07 '11 at 14:38
  • 1
    This is bordering on a research problem. It's unlikely you will find anything that's ready-made for you to use. – mpenkov Nov 07 '11 at 15:16
0

If you're working with text only, you could OCR both images and compare the extracted text.

mpenkov
  • 21,621
  • 10
  • 84
  • 126
  • This would be a good idea, but I am also looking to check the layout - such as missing or extra line breaks. Some characters at the edges of the text might get partially cut off. Also, the end result needs to be an image with all the differences highlighted so that a human operator will have a final say in what is substantial in each particular case. – user819490 Nov 07 '11 at 14:33