0

I am wanted to know will LSH(Locality senstive hashing) work for any kind of files to find nearest neighbors ? Means i noticed everywhere, using text file only but i want to find for wim,iso and zip files.

So will it work for the wim, iso and zip files as well.

Thanks in advance

  • 1
    LSH cannot deal with compressed data. You will need to decompress it first. Additionally, ISO-files will likely have many sections in common such as metadata and format specifiers which might make them match other ISOs that have completely different content. Having only a cursory knowledge about LSH I would assume that you need "proper data" that does not have such overhead such as formatting codes, or being compressed. For instance, if you change all the *text* in a html document, but leave all html tags as-is, I'm assuming LSH might trigger off of that for some documents. – Lasse V. Karlsen Jul 10 '20 at 08:09
  • Thankyou so much @LasseV.Karlsen – Mohammad Wasim Khan Jul 10 '20 at 08:34

1 Answers1

0

There is a paper which might be interesting in this context:

Edward Raff, Joe Aurelio. PyLZJD: An Easy to Use Tool for Machine Learning in Proceedings of the 18th Python in Science Conference, 97-102. http://hdl.handle.net/11603/14971

It actually defines a metric for compressed data, which could be used for LSH.

otmar
  • 386
  • 1
  • 9