-1

Is there any way to update all data in elasticsearch.

In below example, update done for external '1'.

curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
{
 "doc": { "name": "Jane Doe", "age": 20 }
}'

Similarly, I need to update all my data in external. Is there any way or query to updating all data.

venkat
  • 73
  • 2
  • 14
  • This answer should help: http://stackoverflow.com/questions/38636348/find-and-replace-in-elasticsearch-all-documents/38636633#38636633 If you're running ES 1.3.2, you need to install the [update-by-query](https://github.com/yakaz/elasticsearch-action-updatebyquery) plugin beforehand and restart your cluster – Val Aug 19 '16 at 11:25
  • val In above link they replace the name. Here i need to add extra field ("status":"done") in all my external , for that what i need to do . – venkat Aug 19 '16 at 11:59
  • You can do many changes in the script, like adding a new field, changing an existing one and removing an existing one. – Val Aug 19 '16 at 12:03
  • I tried following query curl -XPOST 'localhost:9200/tellofy/brand/_update_by_query' -d ' { "script" : "ctx._source.brandprivacy = "false"" }' I face following error {"error":"UnavailableShardsException[[tellofy][2] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: index {[tellofy][brand][_update_by_query], source[\n{\n \"script\" : \"ctx._source.brandprivacy = \"false\"\"\n}]}]","status":503}> Do u have any suggestion for above error – venkat Aug 19 '16 at 12:42
  • you have too many double quotes, try this: `curl -XPOST 'localhost:9200/customer/external/_update_by_query' -d ' { "script" : "ctx._source.brandprivacy = false" }' ` – Val Aug 19 '16 at 12:42
  • after placing ur codes also , getting same error..Kindly see above my error – venkat Aug 19 '16 at 12:53
  • It doesn't seem your ES node is running correctly `Primary shard is not active or isn't assigned to a known node`. What do you see in the head plugin (if you have it)? – Val Aug 19 '16 at 12:54

1 Answers1

1

Updating all documents in an index means that all documents will be deleted and new ones will be indexed. Which means lots of "marked-as-deleted" documents.

When you run a query ES will automatically filter out those "marked-as-deleted" documents, which will have an impact on the response time of the query. How much impact it depends on the data, use case and query.

Also, if you update all documents, unless you run a _force_merge there will be segments (especially the larger ones) that will still have "marked-as-deleted" documents and those segments are hard to be automatically merged by Lucene/Elasticsearch.

My suggestion, if your indexing process is not too complex (like getting the data from a relational database and process it before indexing into ES, for example), is to drop the index completely and index fresh data. It might be more effective than updating all the documents.

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
  • But i have a lot amount of data , i need to update those. Is there any way or possibilities to merge or update without droping . – venkat Aug 19 '16 at 13:31
  • You can update all of them, but at the end you'll not have an index like the one after a fresh re-index. You'll have an index with some "marked-as-deleted" documents. And you'd need to run a [forced merge](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html). You can still update all documents and not force merge, but there **might** be some performance impact at query time because of still existent marked-as-deleted documents. – Andrei Stefan Aug 19 '16 at 13:51