1

I am using LSH for database records and by that I am creating a index (not a database index, a simple hashmap) where similar records blocked in to the same bucket. The database may contain several millions of records. My question regards with the design I post below.

enter image description here

First I will create the index using the database available by executing LSH. But when a new record inserted in to the database I must index that record also in to the index. How can I do this using LSH? Can LSH allocate that record to the bucket that have similar records?? Does LSH support updates in to the dataset?

Community
  • 1
  • 1
Janitha Tennakoon
  • 856
  • 11
  • 40
  • Does the new data insertion process can be customized/altered, or you only have access to the database and not the input process itself? – Averman Nov 12 '15 at 01:47

1 Answers1

2

I would use E2LSH (which is developed by Andoni, which is a great guy), which is written in C++. In the site of the project, it is mentioned:

Newest (not quite) LSH algorithms (2014): These algorithms achieve performance better than the classic LSH algorithms by using data-dependent hashing. They improve over classic LSH algorithms for both Hamming and Euclidean space. These algorithms are not dynamic however, in contrast to the classic LSH algorithms, which use data-independent hashing and hence allow updates to the pointset.

If you do not want to use a library, but for some reason you want to develop your own, I would suggest you to study the manual first.

gsamaras
  • 71,951
  • 46
  • 188
  • 305