34

I have a very simple question :

I want to update multiple documents to elasticsearch. Sometimes the document already exists but sometimes not. I don't want to use a get request to check the existence of the document (this is decreasing my performance). I want to use directly my update request to index the document directly if it doesn't exist yet.

I know that we can use upsert to create a non existing field when updating a document, but this is not what I want. I want to index the document if it doesn't exist. I don't know if upsert can do this.

Can you provide me some explaination ?

Thanks in advance!

razafinr
  • 932
  • 2
  • 10
  • 15

5 Answers5

63

This is doable using the update api. It does require that you define the id of each document, since the update api requires the id of the document to determine its presence.

Given an index created with the following documents:

PUT /cars/car/1 
{ "color": "blue", "brand": "mercedes" }
PUT /cars/car/2
{ "color": "blue", "brand": "toyota" }

We can get the upsert functionality you want using the update api with the following api call.

POST /cars/car/3/_update
{
    "doc": {
        "color" : "brown",
        "brand" : "ford"
    },
    "doc_as_upsert" : true
}

This api call will add the document to the index since it does not exist.

Running the call a second time after changing the color of the car, will update the document, instead of creating a new document.

POST /cars/car/3/_update
{
    "doc": {
        "color" : "black",
        "brand" : "ford"
    },
    "doc_as_upsert" : true
}
lisak
  • 21,611
  • 40
  • 152
  • 243
rclement
  • 1,664
  • 14
  • 10
  • 5
    I want to update the record base on the name field , instead of the id. So is it possible by upsert? – Sagar Vaghela Aug 01 '17 at 10:06
  • How to upsert based on a condition like say when the color is not black do the upsert? – user1870400 Jun 28 '18 at 08:14
  • I don't know how this was selected as the answer. The original author clearly said "I know that we can use upsert to create a non existing field when updating a document, but this is not what I want. I want to index the document if it doesn't exist.", and yet you are saying upsert will create or update... – xgmexgme May 04 '22 at 22:29
  • If you want to do this in Python API, check out [this question](https://stackoverflow.com/q/33226831/7122272). – Jaroslav Bezděk Aug 05 '22 at 08:44
8

AFAIK when you index the documents (with a PUT call), the existing version gets replaced with the newer version. If the document did not exist, it gets created. There is no need to make a distinction between INSERT and UPDATE in ElasticSearch.

UPDATE: According to the documentation, if you use op_type=create, or a special _create version of the indexing call, then any call for a document which already exists will fail.

Quote from the documentation:

Here is an example of using the op_type parameter:

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}'
Another option to specify create is to use the following uri:

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}'
Ashalynd
  • 12,363
  • 2
  • 34
  • 37
2

For bulk API use

bulks.push({
       update: {
               _index: 'index',
               _type: 'type',
              _id: id
               }
});
bulks.push({"doc_as_upsert":true, "doc": your_doc});
Amio.io
  • 20,677
  • 15
  • 82
  • 117
0

As of elasticsearch-model v0.1.4, upserts aren't supported. I was able to work around this by creating a custom callback.

after_commit on: :update do
    begin
        __elasticsearch__.update_document
    rescue Elasticsearch::Transport::Transport::Errors::NotFound
        __elasticsearch__.index_document
    end
end
spyle
  • 1,960
  • 26
  • 23
0

I think you want "create" action

Here's the bulk API documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

The index and create actions expect a source on the next line, and have the same semantics as the op_type parameter in the standard index API: create fails if a document with the same ID already exists in the target, index adds or replaces a document as necessary.

Difference between actions:

create

(Optional, string) Indexes the specified document if it does not already exist. The following line must contain the source data to be indexed.

index

(Optional, string) Indexes the specified document. If the document exists, replaces the document and increments the version. The following line must contain the source data to be indexed.

update

(Optional, string) Performs a partial document update. The following line must contain the partial document and update options.

doc

(Optional, object) The partial document to index. Required for update operations.

xgmexgme
  • 132
  • 7