Spark implementation for Locality Sensitive Hashing

Question

As part of a project I'm doing for my studies I'm looking for a way to use the hashing function of LSH with Spark. Is there any way to do so?

score 3 · Accepted Answer · answered Jan 04 '15 at 01:51

3

Try this implementation:

https://github.com/mrsqueeze/spark-hash

Quoting from the README, "this implementation was largely based on the algorithm described in chapter 3 of Mining of Massive Datasets" which has a great description of LSH and minhashing.

answered Jan 04 '15 at 01:51

Nilesh

1,222
1
11
23

@user3636583 please let us know how it fares for your use cases, especially versus FLANN, ANNOY, nearpy, SparseLSH, LSHForest (scikit-learn) etc. Personally I found the above Spark implementation to be very memory hungry. – Nilesh Jan 05 '15 at 22:17

score 1 · Answer 2 · answered Dec 31 '16 at 03:01

1

The recently released version of Spark (2.1.0) provides built-in support for LSH, but apparently only in the Scala API (not in PySpark yet).

answered Dec 31 '16 at 03:01

xenocyon

2,409
3
20
22

Spark implementation for Locality Sensitive Hashing

2 Answers2