9

I am using elasticsearch-py for elasticsearch operation.

I am trying for elasticsearch.helpers.bulk to create or update multiple records.

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)

Is there any way to perform this operation?

Now we can give only _op_type as create or update. If we give update and record is not exist, then it will raise error.

Traceback (most recent call last):
  File "/tmp/test.py", line 37, in <module>
    print helpers.bulk(es, data)
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 155, in streaming_bulk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('4 document(s) failed to index.', [{u'update': {u'status': 404, u'_type': u'external', u'_id': u'3', u'error': u'DocumentMissingException[[customer][-1] [external][3]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'4', u'error': u'DocumentMissingException[[customer][-1] [external][4]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'5', u'error': u'DocumentMissingException[[customer][-1] [external][5]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'6', u'error': u'DocumentMissingException[[customer][-1] [external][6]: document missing]', u'_index': u'customer'}}])
Nilesh
  • 20,521
  • 16
  • 92
  • 148
  • 1
    have you tried using `index` as `op_type` instead of `create` and `update` ? – Val Aug 21 '15 at 06:16
  • @Val, as per `helpers.bulk` document, we have to give `index`, I also tried your solution, its give `ValidationError`, `elasticsearch.exceptions.TransportError: TransportError(500, u'ActionRequestValidationException[Validation Failed: 1: no requests added;]')` – Nilesh Aug 21 '15 at 06:22
  • That's weird... You're sure you have `"_op_type": "index"`? – Val Aug 21 '15 at 06:29
  • You can check docs for this method http://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.bulk – Nilesh Aug 21 '15 at 06:30
  • 1
    Also have you tried without specifying `_op_type` at all, I think it will default to `index` by itself. – Val Aug 21 '15 at 06:32
  • @Val, thanks, it works :). Without `_op_type` it take it as `create` or `update`. Thanks for help. – Nilesh Aug 21 '15 at 06:37

2 Answers2

10

According to the _bulk endpoint documentation, you can and should use the index action for this, provided your documents always have the same identifiers.

create is useful when creating documents the first time, and update is more meant for doing partial and/or scripted updates.

You can also not specify any _op_type at all and index will be taken by default.

Val
  • 207,596
  • 13
  • 358
  • 360
5

I tried solution suggested by @Val and it works as charm.

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)
Mayank
  • 5,411
  • 1
  • 16
  • 16