2

say i've got a large set of documents that contain perceptual hashes (around 35,000), what is the fastest way that I can (using mongodb) compare a given hash X to all the hashes in my database and find the ones with a distance less than N.

I'm using python by the way, I'm assuming this isn't possible natively in mongo, but maybe it is possible to optimize this somehow?

davegri
  • 2,206
  • 2
  • 26
  • 45
  • This would be entirely subjective to how the actual content for comparison is stored. If the "hash" data is possible to process with JavaScript ( ie a string in binary representation ) that could be XOR'ed then you could use [`$where`](https://docs.mongodb.org/manual/reference/operator/query/where/) to filter documents under a certain "distance". If you wanted (need) to use libary functions like those in [pHash](https://github.com/polachok/py-phash) then there is no other option than returning each document to the client for comparison. – Blakes Seven Feb 07 '16 at 22:47
  • 3
    Can you show me an example of using where to find strings under a certain xor distance? – davegri Feb 07 '16 at 23:12

0 Answers0