0

I'm trying to delete fields from an object of an array in Elasticsearch. The index has been dynamically generated.

This is the mapping:

{
  "mapping": {
    "_doc": {
      "properties": {
        "age": {
          "type": "long"
        },
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "result": {
          "properties": {
            "resultid": {
              "type": "long"
            },
            "resultname": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          },
        "timestamp": {
          "type": "date"
        }
      }
    }
  }
}
}

this is a document:

{
    "result": [
        {
            "resultid": 69,
            "resultname": "SFO"
        },
        {
            "resultid": 151,
            "resultname": "NYC"
        }
    ],
    "age": 54,
    "name": "Jorge",
    "timestamp": "2020-04-02T16:07:47.292000"
}

My goals is to remove all the fields resultid in result in all the document of the index. After update the document should look like this:

{
    "result": [
        {
            "resultname": "SFO"
        },
        {
            "resultname": "NYC"
        }
    ],
    "age": 54,
    "name": "Jorge",
    "timestamp": "2020-04-02T16:07:47.292000"
}

I tried using the following articles on stackoverflow but with no luck: Remove elements/objects From Array in ElasticSearch Followed by Matching Query remove objects from array that satisfying the condition in elastic search with javascript api Delete nested array in elasticsearch Removing objects from nested fields in ElasticSearch

Hopefully someone can help me find a solution.

Luc E
  • 1,204
  • 8
  • 16
Mark
  • 343
  • 3
  • 11

3 Answers3

2

You should reindex your index in a new one with _reindex API and call a script to remove your fields :

POST _reindex
{
  "source": {
    "index": "my-index"
  },
  "dest": {
    "index": "my-index-reindex"
  },
  "script": {
    "source": """
     for (int i=0;i<ctx._source.result.length;i++) {
        ctx._source.result[i].remove("resultid")
     }
     """

  }
}

After you can delete your first index :

DELETE my-index

And reindex it :

POST _reindex
{
  "source": {
    "index": "my-index-reindex"
  },
  "dest": {
    "index": "my-index"
  }
}
Luc E
  • 1,204
  • 8
  • 16
  • 1
    Thank you so much! I used your script with an "update by query" so that I don't have reindex – Mark Apr 03 '20 at 15:50
  • Perfect ! it's just safer to reindex in a new index. Can you mark my answer as resolved – Luc E Apr 03 '20 at 15:54
  • I agree that it is safer to reindex to a new index, so that if something goes wrong you'll always have your old index. – Mark Apr 03 '20 at 16:09
1

I combined the answer from Luc E with some of my own knowledge in order to reach a solution without reindexing.

POST INDEXNAME/TYPE/_update_by_query?wait_for_completion=false&conflicts=proceed
{
"script": {
    "source": "for (int i=0;i<ctx._source.result.length;i++) { ctx._source.result[i].remove(\"resultid\")}"
    },
"query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "result.id"
          }
        }
      ]
    }
  }
}

Thanks again Luc!

Mark
  • 343
  • 3
  • 11
1

If your array has more than one copy of element you want to remove. Use this: ctx._source.some_array.removeIf(tag -> tag == params['c'])