What does locality-sensitive
stands for in locality-sensitive hashing
? Is there formal definition of this term ?

- 5,885
- 14
- 62
- 93
-
1See accepted answer here: http://stackoverflow.com/questions/12952729/ , provides very good explanation. – kebs Apr 08 '13 at 15:15
2 Answers
LSH maps high dimension vectors to buckets and tries to ensure that vectors that are "near" to each other are mapped to the same bucket. The definition of "near" is just the neighborhood with respect to some distance function (e.g. Euclidean).
"Locality" refers to region in space; and "sensitive" means that the nearby locations are mapped to same bucket. In other words, the output of the hashing function depends on (is sensitive to) the location in space (the locality).
This is my understanding. I am sure theoretical folks must have more formal definition. Hope this helps.

- 364
- 1
- 8
Usually, hashing functions would be used to separate nearby values, to reduce the risk of collisions. Think of cryptographic hashes: you do want every single character change to completely change the hash code.
This does not hold for the hash functions as used in LSH. Well, technically it holds for the hash functions, but not for the step just before hashing: the data is put into buckets, a lossy operation, which usually will put nearby points into the same bucket. After that, only the bucket numbers are actually hashed (IIRC), so you don't get millions of buckets, but only as many as desired.
If you have independent functions you use for mapping and binning, they will likely overlap, so that you can find all true neighbors in at least one of the hash collision buckets the query point is in.

- 76,138
- 12
- 138
- 194