1

I have over 30 million documents in Elasticsearch (version - 6.3.3), I am trying to add new field to all existing documents and setting the value to 0.

For example: I want to add start field which does not exists previously in Twitter document, and set it's initial value to 0, in all 30 million documents.

In my case I was able to update 4 million only. If I try to check the submitted task with TASK API http://localhost:9200/_task/{taskId}, result from says something like ->

{
  "completed": false,
  "task": {
    "node": "Jsecb8kBSdKLC47Q28O6Pg",
    "id": 5968304,
    "type": "transport",
    "action": "indices:data/write/update/byquery",
    "status": {
      "total": 34002005,
      "updated": 3618000,
      "created": 0,
      "deleted": 0,
      "batches": 3619,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": -1.0,
      "throttled_until_millis": 0
    },
    "description": "update-by-query [Twitter][tweet] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.Twitter.start = 0;', options={}, params={}}",
    "start_time_in_millis": 1574677050104,
    "running_time_in_nanos": 466805438290,
    "cancellable": true,
    "headers": {}
  }
}

The query I am executing against ES , is something like:

curl -XPOST "http://localhost:9200/_update_by_query?wait_for_completion=false&conflicts=proceed" -H 'Content-Type: application/json' -d'
{
  "script": {
    "source": "ctx._source.Twitter.start = 0;"
  },
  "query": {
    "exists": {
      "field": "Twitter"
    }
  }
}'

Any suggestions would be great, thanks

aniketk
  • 109
  • 1
  • 7
  • Can you share the full output you get from `/_task/{taskId}` (i.e. with all the details)? Also are you sure that the query has finished running? Using `wait_for_completion=false` if you still get an output from `GET _tasks` then it means the update has not finished running – Val Nov 25 '19 at 12:39
  • Please check the output from the task api. I just edited my question with detailed output of Task API. As completed says, false so it is not finished. – aniketk Nov 25 '19 at 12:45
  • 3
    It looks like it's still running, i.e. `"completed": false`! – Val Nov 25 '19 at 12:45
  • When I says `wait_for_completion=true`, sample of console logs are like following, there is long list of array of same json error doc repeated with different id. { "index": "twitter", "type": "tweet", "id": "tE_6um0B6TCN8_oX49hp", "cause": { "type": "cluster_block_exception", "reason": "blocked by: [FORBIDDEN/8/index write (api)];" }, "status": 403 } – aniketk Nov 25 '19 at 12:52
  • That's ok, it takes a long time to execute. I'm just saying that your task is not finished yet. Does the `updated` count increase over time? – Val Nov 25 '19 at 13:00
  • I checked after sometime if update counts increases. But it does not – aniketk Nov 25 '19 at 13:05
  • Can you find anything when running `GET .tasks/task/` (i.e. `.tasks` instead of `_tasks`) ? – Val Nov 25 '19 at 13:06
  • With .tasks/task/ , i get following result -> {"_index":".tasks","_type":"tasks","_id":"JKBpOUr7R0-P4DITO3OPrg:14067968","found":false} – aniketk Nov 25 '19 at 13:13
  • Obviously since it's still running. However, the reason might be that the update is blocked because of that `cluster_block_exception` error we saw earlier. In this case, it seems like some permission to write the twitter index has changed while the update was running. Did you change something security-related after starting your update? – Val Nov 25 '19 at 13:14
  • As far as I understood, I am not modifying permission by executing any other command manually. Probably something internal to ES, which modifies permission while updating over standard limits , may be , just a guess – aniketk Nov 25 '19 at 13:18
  • Are you using the AWS Elasticsearch managed service? – Val Nov 25 '19 at 13:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/203033/discussion-between-aniketk-and-val). – aniketk Nov 25 '19 at 13:37
  • 1
    Then your issue is described [here](https://stackoverflow.com/a/47745128/4604579). You're either low on disk or low on heap memory. You need to fix that and your updates will work again. – Val Nov 25 '19 at 13:57

0 Answers0