Questions tagged [locality-sensitive-hash]

Locality-sensitive hashing (LSH) is a method of probabilistic dimension reduction.

Locality-sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

97 questions
2
votes
0 answers

BRAND descriptor - Image descriptor as input of LSH - Binary representation

I have a questions which is also mentioed in this answer and this one but I'm using the binary descriptor and need more informations: I'm using BRAND descriptors as an input of LSH problem. The descriptor's size are 300*32 to 400*32, in which 32 is…
Mina
  • 33
  • 1
  • 7
2
votes
1 answer

Locality-sensitive hashing of strings?

Is there a hash function for strings, such that strings within a small edit distance (for example, misspellings) would map to the same, or very close, hash values, while dissimilar strings would tend not to?
MWB
  • 11,740
  • 6
  • 46
  • 91
2
votes
1 answer

How Locality Sensitive Hashing (LSH) works?

I've read already this question, but unfortunately it didn't help. What I don't understand is what we do once we understood which bucket assign to our high-dimensional space query vector q: suppose that using our set of locality sensitive family…
2
votes
1 answer

Is LSH about transforming vectors to binary vectors for hamming distance?

I read some paper about LSH and I know that is used for solving the approximated k-NN problem. We can divide the algorithm in two parts: Given a vector in D dimensions (where D is big) of any value, translate it with a set of N (where N<
2
votes
1 answer

Search in locality sensitive hashing

I'm trying to understand the section 5. of this paper about LSH, in particular how to bucket the generated hashes. Quoting the linked paper: Given bit vectors consisting of d bits each, we choose N = O(n 1/(1+epsilon) ) random permutations of the…
2
votes
1 answer

What is the ε (epsilon) parameter in Locality Sensitive Hashing (LSH)?

I've read the original paper about Locality Sensitive Hashing. The complexity is in function of the parameter ε, but I don't understand what it is. Can you explain its meaning please?
2
votes
1 answer

Locality Sensitivy Hashing in OpenCV for image processing

This is my first image processing application, so please be kind with this filthy peasant. THE APPLICATION: I want to implement a fast application (performance are crucial even over accuracy) where given a photo (taken by mobile phone) containing a…
2
votes
1 answer

How to hash lists?

Lists are not hashable. However, I am implementing LSH and I am seeking for a hash function that will correspond a list of positive integers (in [1, 29.000]) to k buckets. The number of lists is D, where D > k (I think) and D = 40.000, where k is…
gsamaras
  • 71,951
  • 46
  • 188
  • 305
2
votes
1 answer

Why k and l for LSH used for approximate nearest neighbours?

In all the Locality Sensitive Hashing explanations (i.e. http://en.wikipedia.org/wiki/Locality-sensitive_hashing#LSH_algorithm_for_nearest_neighbor_search ) They describe that k Hash Functions are generated, but only l (l < k) are used in the hash…
2
votes
0 answers

PySpark: hash() ResultIterable differs before and after collect()

I'm attempting to implement locality-sensitive hashing in PySpark (based on the spark-hash project, written in Scala). The hashing step is generating some strange behavior. In a step where I take the hash of the list of minhashes generated for each…
Magsol
  • 4,640
  • 11
  • 46
  • 68
2
votes
2 answers

Spark implementation for Locality Sensitive Hashing

As part of a project I'm doing for my studies I'm looking for a way to use the hashing function of LSH with Spark. Is there any way to do so?
user3636583
  • 177
  • 1
  • 3
  • 11
2
votes
1 answer

locality sensitive hashing for spatial data

I would like to find out a Locality Sensitive Hashing algorithm in order to split my spatial data into a number of buckets(reducer tasks). The spatial data are actually trajectories so from my understating of LSH a trajectory will be represented a…
Adam
  • 1,018
  • 1
  • 9
  • 20
2
votes
1 answer

Clarification needed about min/sim hashing + LSH

I have a reasonable understanding of a technique to detect similar documents consisting in first computing their minhash signatures (from their shingles, or n-grams), and then use an LSH-based algorithm to cluster them efficiently (i.e. avoid the…
2
votes
1 answer

Matching Differences between two documents

i have a set of strings along with their co-ordinates and rectangular bounds int two similar pages. these strings are different in three possible ways. (i) a string can be moved to a new location on a page. (ii) a string is in the same location but…
2
votes
4 answers

Comparable hashes

i wasn't able to answer my question. I need a hashing method that will generate a hash that can be compared with others and find out the fidelity, let's say i have to 2 strings, "mother", "father" and when i compare the 2 hashes, it will say that…
Mihai Alin
  • 89
  • 1
  • 2
  • 9