Use Locality Sensitive Hashing on dynamic data set

Question

I am using LSH for database records and by that I am creating a index (not a database index, a simple hashmap) where similar records blocked in to the same bucket. The database may contain several millions of records. My question regards with the design I post below.

First I will create the index using the database available by executing LSH. But when a new record inserted in to the database I must index that record also in to the index. How can I do this using LSH? Can LSH allocate that record to the bucket that have similar records?? Does LSH support updates in to the dataset?

Does the new data insertion process can be customized/altered, or you only have access to the database and not the input process itself? — Averman, Nov 12 '15 at 01:47

score 2 · Accepted Answer · answered Sep 07 '15 at 05:16

I would use E2LSH (which is developed by Andoni, which is a great guy), which is written in C++. In the site of the project, it is mentioned:

Newest (not quite) LSH algorithms (2014): These algorithms achieve performance better than the classic LSH algorithms by using data-dependent hashing. They improve over classic LSH algorithms for both Hamming and Euclidean space. These algorithms are not dynamic however, in contrast to the classic LSH algorithms, which use data-independent hashing and hence allow updates to the pointset.

If you do not want to use a library, but for some reason you want to develop your own, I would suggest you to study the manual first.

Use Locality Sensitive Hashing on dynamic data set

1 Answers1