1

I am looking for an efficient implementation of LSH in python 3 that uses Euclidean distance.

There is the "in-python" LSHForest implementation, but it uses cosine distances.

Also, even using this implementation, I didn't find a way to see the content of each of the baskets, e.g., if using LSH for clustering - it only returns a certain number of approximate neighbors within a certain radius. But if I want to see all neighbors, I don't see how it can be done (I do not want to use an arbitrary radius of search and I am really not sure what is the meaning of a very large or infinite radius using this implementation).

Will appreciate any insight. Many thanks.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
user3861925
  • 713
  • 2
  • 10
  • 24

1 Answers1

1

For software recommendations, please ask here: Software Recommendations.


For how this works, first read my answer and then assume that you ask from the package (I haven't used it) a big k (k should be the number of Neighbors that the software returns), within a big radius r. That should return many neighbors, set k = N, where N is the number of the points in your dataset and you will get all the neighbors.

If you want to see all the neighbors within a certain bucket, then you have to investigate how many points can a bucket contain and set k to that number.

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • "then you have to investigate how many points can a bucket contain" - meaning that I need to go to the source-code as the implementation is probably different from the article it's based on? I saw no other way to do this using package options. Maybe someone who used the package can answer this? Thank you – user3861925 Jun 15 '16 at 13:49
  • @user3861925 yes, is implementation defined. Well if the article specifies that info, you could be based on that. Thanks for the upvote, you can also accept the answer if you like. Good question BTW! – gsamaras Jun 15 '16 at 15:44