Questions tagged [locality-sensitive-hash]

Locality-sensitive hashing (LSH) is a method of probabilistic dimension reduction.

Locality-sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

97 questions

votes

3 answers

Nearest Neighbor Search

I want an algorithm for Nearest Neighbor Search(NNS) Problem. The problem is related to Computational Geometry field. I searched a lot, but i did not find an algorithm for that. I think locality sensitive hash(LSH) algorithm will be good for this…

asked Dec 12 '12 at 05:23

Seyed Morteza Mousavi

6,855
8
43
69

vote

0 answers

Efficient string similarity search for huge corpora

I am doing a similarity search between a 256 characters long string and a corpus made of 9000 entries with each about 1000 words. I used LocalitySensitiveHashing, see…

python nlp cosine-similarity sentence-similarity locality-sensitive-hash

asked Mar 01 '22 at 03:34

Per Bock

vote

1 answer

Is the number of rows always 1 in each band in the Spark implementation of MinHashLSH

I'm trying to understand the MinHash LSH implementation in Spark, org.apache.spark.ml.feature.MinHashLSH. These two files seem the most relevant: MinHashLSH.scala and LSH.scala. To use MinHashLSH, the doc says it needs a numHashTables parameter,…

apache-spark locality-sensitive-hash minhash

asked Dec 11 '20 at 22:15

zyxue

7,904
5
48
74

vote

0 answers

Technique For Comparing Items in a Set with Varying Numbers of Attributes Possibly Using LSH

I have a data set containing millions of items collected from many disparate sources. Each item contains a list of anywhere from fifty to a thousand attributes. The specific attributes available vary greatly from item to item. I am looking for the…

data-science similarity cosine-similarity locality-sensitive-hash minhash

asked Feb 07 '19 at 18:36

Anthony Gatlin

4,407
5
37
53

vote

1 answer

Can Locality Sensitive Hashing be applied on dynamic-dimensional data points?

For example assume that we have some vectors with differnt length and what we want to do is measuring the similarity between each two pair of these vectors. What we have to consider is that these vectors' dimensions are time-varying. Can we do this?

hash similarity nearest-neighbor locality-sensitive-hash lsh

asked Nov 28 '18 at 15:46

agtabesh

vote

1 answer

Optimum number of permutations to use for estimating set similarity using min hash

Let's say I have to find estimate the jaccard similarity between documents A and B, and I use k random permutations of the union of these sets/documents to determine the documents' signatures. How should I set my k value? Since setting it to a…

bigdata similarity locality-sensitive-hash minhash

asked Nov 23 '17 at 01:46

theitpushover

vote

1 answer

BucketRandomProjectionLSH KNN parameters

I am trying to use KNN algorithm from spark 2.2.0. I am wondering how I should set my bucket length. The record count/number of features varies, so I think it is better to set length by some conditions. How should I set the bucket length for better…

knn locality-sensitive-hash

asked Sep 21 '17 at 11:10

Yong Hyun Kwon

vote

0 answers

Finding k-nn with LSH when k > size of bucket

I've been reading up on the literature around locality sensitive hashing, and I think have a pretty good understanding of how it works. Considering the most simple case of a single hash table where each document is in only one bucket, my question…

database postgresql computer-science nearest-neighbor locality-sensitive-hash

asked Apr 21 '17 at 03:20

JVillella

1,029
1
11
21

vote

0 answers

Should I mix queries in for LSH index building

When I read the Multiprobe LSH and the Multiprobe LSH performance tuning paper, I found in their experiments, the queries were randomly chosen from the dataset for index building. Is it a must to do so? Can LSH handle the unseen points query?

knn locality-sensitive-hash

asked Apr 20 '17 at 04:50

Huan FENG

vote

1 answer

Karlhigley LSH ANN model for finding nearest neighbors giving null results

I want to find the nearest neighbors to each of my points and I tried it using karlhigley ANN model. Here is the piece of code List> svList = new ArrayList<>(); svList.add(new Tuple2(3L, …

java apache-spark nearest-neighbor locality-sensitive-hash

asked Jan 05 '17 at 05:33

Goutham Panneeru

vote

1 answer

Locality sensitive hashing - what happens when a bucket is empty?

Assume I've constructed an LSH database according to some set of hashes, and I'm now beginning to query the database to find approximate nearest neighbors. Are there any guidelines to what happens when you compute the hash for a query point, and the…

locality-sensitive-hash

asked Dec 18 '16 at 21:54

jayelm

7,236
5
43
61

vote

1 answer

LSH implementation in python 3 with Euclidean distance and seeing all neighbors in LSHForest

I am looking for an efficient implementation of LSH in python 3 that uses Euclidean distance. There is the "in-python" LSHForest implementation, but it uses cosine distances. Also, even using this implementation, I didn't find a way to see the…

python-3.x computational-geometry nearest-neighbor locality-sensitive-hash approximate-nn-searching

asked Jun 13 '16 at 12:24

user3861925

vote

1 answer

How to solve nearest neighbor through the R-nearest neighbor?

Citing the E2LSH manual (it's not important that's about this specific library, this quote should be true for NN problem in general): E 2LSH can be also used to solve the nearest neighbor problem, where, given the query q, the data structure is…

c++ math computational-geometry nearest-neighbor locality-sensitive-hash

asked Jun 08 '16 at 12:57

justHelloWorld

6,478
8
58
138

vote

0 answers

Binary descriptors: find the most similar image in OpenCV with LSH

flannIndex in openCV is designed for matching 2 images through binary descriptors. Anyway LSH is heavily used in CBIR in order not to "comparing two images" but "find the most similar image in the dataset", which is obviously something…

c++ opencv image-processing locality-sensitive-hash cbir

asked May 29 '16 at 15:18

justHelloWorld

6,478
8
58
138

vote

1 answer

Matlab : Conceptual difficulty in How to create multiple hash tables in Locality sensitive Hashing

The key idea of Locality sensitive hashing (LSH) is that neighbor points, v are more likely mapped to the same bucket but points far from each other are more likely mapped to different buckets. In using Random projection, if the the database…

matlab nearest-neighbor locality-sensitive-hash

asked May 25 '16 at 17:47

SKM

Prev 1 2 3

5 6 7 Next