0

I want to avoid adding duplicate documents into an ES type. Let's say I use the title and userID fields. The document ID would be different, however, for new inserts. But I want to ensure that no duplicate records matching the userID and title fields are inserted through the bulk insert process.

I realize that existing documents could be updated, but as I understand it, this does a delete/insert and doesn't free up the once-used space.

In SQL Server, I used a TVP that took in a DataTable and did the checking and inserting.

How can this be done using NEST and ElasticSearch?

ElHaix
  • 12,846
  • 27
  • 115
  • 203
  • 2
    If every pair of `userId` and `title` are unique, you could create your own document IDs based on a hash of both fields (e.g. `md5(userId:title)`) for instance. That way you'd never create any duplicates. Then you can use that hash as the document ID in your `_bulk` queries. – Val Aug 08 '15 at 13:32
  • For a given userID, there could be duplicate titles, which is what I want to avoid. Good idea about the hash, however what happens when duplicate hash insert is attempted? – ElHaix Aug 08 '15 at 15:18
  • If you're going to pass an already existing id and bulk insert it - it will overwrite it. – Evaldas Buinauskas Sep 27 '15 at 08:10
  • @EvaldasBuinauskas - ID's will be auto-generated. It is possible that a duplicate title could be available, if so I do not want it inserted. – ElHaix Sep 29 '15 at 18:36
  • I guess you'll have to run two queries then. One to check if document with that title exists(term query), based on results(hits count) update or ignore request... That's all I can think of. – Evaldas Buinauskas Sep 29 '15 at 18:39

0 Answers0