Questions tagged [locality-sensitive-hash]

Locality-sensitive hashing (LSH) is a method of probabilistic dimension reduction.

Locality-sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

97 questions
2
votes
3 answers

Nearest Neighbor Search

I want an algorithm for Nearest Neighbor Search(NNS) Problem. The problem is related to Computational Geometry field. I searched a lot, but i did not find an algorithm for that. I think locality sensitive hash(LSH) algorithm will be good for this…
1
vote
0 answers

Efficient string similarity search for huge corpora

I am doing a similarity search between a 256 characters long string and a corpus made of 9000 entries with each about 1000 words. I used LocalitySensitiveHashing, see…
1
vote
1 answer

Is the number of rows always 1 in each band in the Spark implementation of MinHashLSH

I'm trying to understand the MinHash LSH implementation in Spark, org.apache.spark.ml.feature.MinHashLSH. These two files seem the most relevant: MinHashLSH.scala and LSH.scala. To use MinHashLSH, the doc says it needs a numHashTables parameter,…
zyxue
  • 7,904
  • 5
  • 48
  • 74
1
vote
0 answers

Technique For Comparing Items in a Set with Varying Numbers of Attributes Possibly Using LSH

I have a data set containing millions of items collected from many disparate sources. Each item contains a list of anywhere from fifty to a thousand attributes. The specific attributes available vary greatly from item to item. I am looking for the…
1
vote
1 answer

Can Locality Sensitive Hashing be applied on dynamic-dimensional data points?

For example assume that we have some vectors with differnt length and what we want to do is measuring the similarity between each two pair of these vectors. What we have to consider is that these vectors' dimensions are time-varying. Can we do this?
1
vote
1 answer

Optimum number of permutations to use for estimating set similarity using min hash

Let's say I have to find estimate the jaccard similarity between documents A and B, and I use k random permutations of the union of these sets/documents to determine the documents' signatures. How should I set my k value? Since setting it to a…
1
vote
1 answer

BucketRandomProjectionLSH KNN parameters

I am trying to use KNN algorithm from spark 2.2.0. I am wondering how I should set my bucket length. The record count/number of features varies, so I think it is better to set length by some conditions. How should I set the bucket length for better…
Yong Hyun Kwon
  • 359
  • 1
  • 3
  • 15
1
vote
0 answers

Finding k-nn with LSH when k > size of bucket

I've been reading up on the literature around locality sensitive hashing, and I think have a pretty good understanding of how it works. Considering the most simple case of a single hash table where each document is in only one bucket, my question…
1
vote
0 answers

Should I mix queries in for LSH index building

When I read the Multiprobe LSH and the Multiprobe LSH performance tuning paper, I found in their experiments, the queries were randomly chosen from the dataset for index building. Is it a must to do so? Can LSH handle the unseen points query?
Huan FENG
  • 11
  • 1
1
vote
1 answer

Karlhigley LSH ANN model for finding nearest neighbors giving null results

I want to find the nearest neighbors to each of my points and I tried it using karlhigley ANN model. Here is the piece of code List> svList = new ArrayList<>(); svList.add(new Tuple2(3L, …
1
vote
1 answer

Locality sensitive hashing - what happens when a bucket is empty?

Assume I've constructed an LSH database according to some set of hashes, and I'm now beginning to query the database to find approximate nearest neighbors. Are there any guidelines to what happens when you compute the hash for a query point, and the…
jayelm
  • 7,236
  • 5
  • 43
  • 61
1
vote
1 answer

LSH implementation in python 3 with Euclidean distance and seeing all neighbors in LSHForest

I am looking for an efficient implementation of LSH in python 3 that uses Euclidean distance. There is the "in-python" LSHForest implementation, but it uses cosine distances. Also, even using this implementation, I didn't find a way to see the…
1
vote
1 answer

How to solve nearest neighbor through the R-nearest neighbor?

Citing the E2LSH manual (it's not important that's about this specific library, this quote should be true for NN problem in general): E 2LSH can be also used to solve the nearest neighbor problem, where, given the query q, the data structure is…
1
vote
0 answers

Binary descriptors: find the most similar image in OpenCV with LSH

flannIndex in openCV is designed for matching 2 images through binary descriptors. Anyway LSH is heavily used in CBIR in order not to "comparing two images" but "find the most similar image in the dataset", which is obviously something…
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138
1
vote
1 answer

Matlab : Conceptual difficulty in How to create multiple hash tables in Locality sensitive Hashing

The key idea of Locality sensitive hashing (LSH) is that neighbor points, v are more likely mapped to the same bucket but points far from each other are more likely mapped to different buckets. In using Random projection, if the the database…
SKM
  • 959
  • 2
  • 19
  • 45