Questions tagged [locality-sensitive-hash]

Locality-sensitive hashing (LSH) is a method of probabilistic dimension reduction.

Locality-sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

97 questions
4
votes
1 answer

What value to use for numHashTable in Spark LSH by Uber?

I'm trying to use .approxSimilarityJoin of Spark MLlib LSH: MinHash for Jaccard Distance e.g. val mh = new MinHashLSH() .setNumHashTables(5) .setInputCol("features") .setOutputCol("hashes") I understand that the higher the…
4
votes
2 answers

Number of buckets in LSH

In LSH, you hash slices of the documents into buckets. The idea is that these documents that fell into the same buckets will be potentially similar, thus a nearest neighbor, possibly. For 40.000 documents, what is a good value (pretty much) for the…
gsamaras
  • 71,951
  • 46
  • 188
  • 305
4
votes
4 answers

Indexing for similarity search

I have about 100M numeric vectors (Minhash fingerprints), each vector contains 100 integer numbers between 0 and 65536, and I'm trying to do a fast similarity search against this database of fingerprints using Jaccard similarity, i.e. given a query…
alex
  • 1,757
  • 4
  • 21
  • 32
4
votes
1 answer

Locality Sensitive Hashing on audio fingerprints

I am working on an audio fingerprinting system and have gone through some papers and research recently and this page in particular: c# AudioFingerprinting and Locality Sensitive Hashing I have now got a series of fingerprints for every 32ms of…
3
votes
1 answer

How to use Locality Sensitive Hash --LSHKIT

I really need to use LSHKIT for my program to measure the similarity of some high dimensional vectors. there is a library for lsh called lshkit which can be found here: http://lshkit.sourceforge.net/ I am confused to use it. First of all I could not…
Bipario
  • 221
  • 1
  • 4
  • 14
3
votes
1 answer

Efficient implementation of Hashtable, with cache aware locality property (Locality-sensitive hashtable)

I am trying to play around with C data structure (hash table). I am not using any pre-built hashtable library (like STL), because I want to have a better understanding on how it works. So here I create a hash table, containing list of elements,…
all_by_grace
  • 2,315
  • 6
  • 37
  • 52
3
votes
1 answer

LSH Spark stucks forever at approxSimilarityJoin() function

I am trying to implement LSH spark to find nearest neighbours for each user on very large datasets containing 50000 rows and ~5000 features for each row. Here is the code related to this. MinHashLSH mh = new…
3
votes
1 answer

approximate nearest neighbor (A1NN) for high dimension spaces

I read this question about finding the closest neighbor for 3-dimensions points. Octree is a solution for this case. kd-Tree is a solution for small spaces (generally less than 50 dimensions). For higher dimensions (vectors of hundreds of dimensions…
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138
3
votes
1 answer

Nearest Neighbour - Locality Sensitive Hashing Disadvantage

Locality sensitive hashing seems like a great technique for KNNs without any disadvantages. However, what would be a disadvantage of locality sensitive hashing if someone is using it in industry for practical applications? Under what situations will…
jonty rhodes
  • 81
  • 1
  • 1
  • 7
3
votes
2 answers

Can Locality Sensitive Hashing used on dynamic data?

Can Locality Sensitive Hashing used on dynamic data? For example assume I use LSH first on a 1,000,000 documents and store the results on a index, then I want to add another document to the index created. Can I do it using LSH?
3
votes
0 answers

Amplifying a locality sensitive hash

I'm trying to build a cosine locality sensitive hash so I can find candidate similar pairs of items without having to compare every possible pair. I have it basically working, but most of the pairs in my data seem to have cosine similarity in the…
3
votes
2 answers

Generating Random Hash Functions for LSH Minhash Algorithm

I'm programming a minhashing algorithm in Java that requires me to generate an arbitrary number of random hash functions (240 hash functions in my case), and run any number of integers through it (2000 at the moment). In order to do that, I've been…
user3246779
  • 125
  • 3
  • 12
3
votes
1 answer

Locality Preserving Hash Function For C#

I need a locality preserving hash function implementation for C# (or possibly an alternative solution). I would like to figure out a way to map strings (i.e. similar gene sequence tokens sometimes of slightly different lengths) into the same…
Jake Drew
  • 2,230
  • 23
  • 29
2
votes
0 answers

hash function for a set of 2d curves

I am looking for some spatial hashing algorithms that can hash a list of 2d curves (bezier splines) and have following properties: Tolerance: since control points are expressed in float3 (where z is always 0), I need some kind of threshold to…
bitinn
  • 9,188
  • 10
  • 38
  • 64
2
votes
1 answer

Faster implementation of LSH (AND-OR)

I have a data set of size (160000,3200), in which all the elements are either zero or one. I want to find similar candidates. I have hashed it to (160000,200) using Minhash using one for-loop and it took about two minutes, which I am happy with. I…
Ramki
  • 43
  • 1
  • 8