How to perform bulk inserts to ElasticSearch 'where not exists' a record using NEST?

Question

I want to avoid adding duplicate documents into an ES type. Let's say I use the title and userID fields. The document ID would be different, however, for new inserts. But I want to ensure that no duplicate records matching the userID and title fields are inserted through the bulk insert process.

I realize that existing documents could be updated, but as I understand it, this does a delete/insert and doesn't free up the once-used space.

In SQL Server, I used a TVP that took in a DataTable and did the checking and inserting.

How can this be done using NEST and ElasticSearch?

If every pair of `userId` and `title` are unique, you could create your own document IDs based on a hash of both fields (e.g. `md5(userId:title)`) for instance. That way you'd never create any duplicates. Then you can use that hash as the document ID in your `_bulk` queries. — Val, Aug 08 '15 at 13:32
For a given userID, there could be duplicate titles, which is what I want to avoid. Good idea about the hash, however what happens when duplicate hash insert is attempted? — ElHaix, Aug 08 '15 at 15:18
If you're going to pass an already existing id and bulk insert it - it will overwrite it. — Evaldas Buinauskas, Sep 27 '15 at 08:10
@EvaldasBuinauskas - ID's will be auto-generated. It is possible that a duplicate title could be available, if so I do not want it inserted. — ElHaix, Sep 29 '15 at 18:36
I guess you'll have to run two queries then. One to check if document with that title exists(term query), based on results(hits count) update or ignore request... That's all I can think of. — Evaldas Buinauskas, Sep 29 '15 at 18:39

How to perform bulk inserts to ElasticSearch 'where not exists' a record using NEST?

0 Answers0