0

Trying to work out how to access an item in an ArrayList.

I have the values in _source:

  "session_id" : [
    "19a7ec8d",
    "19a7ec8d"
  ],

As they are all duplicates (due to a faulty Grok script), I want to get rid of the duplicates:

I cannot workout how to access the value.

String old = ctx._source.session_id[0];
ctx._source.remove(\"session_id\");
ctx._source.session_id = old;

I have also tried:

String old = ctx._source.session_id.get(0);

String old = ctx._source.session_id.get(0).value()

String old = ctx._source.session_id[0].value()

String old = ctx._source.session_id.get(0).toString()

Thanks

Rowan Smith
  • 1,815
  • 15
  • 29

2 Answers2

1

You can use _update_by_query

Data:

"hits" : [
      {
        "_index" : "index7",
        "_type" : "_doc",
        "_id" : "zQPYkXEB9JyZpSui0FLw",
        "_score" : 1.0,
        "_source" : {
          "session_id" : [
            "19a7ec8d",
            "19a7ec8d"
          ]
        }
      }
    ]

Query:

POST index7/_update_by_query
{
  "script":{
    "source":"if(ctx._source.session_id instanceof List && ctx._source.session_id.size()>0) { def firstValue=ctx._source.session_id[0];ctx._source.session_id=firstValue;}"
  },
  "query":{
    "match_all":{} 
  }
}

Result:

"hits" : [
      {
        "_index" : "index7",
        "_type" : "_doc",
        "_id" : "zQPYkXEB9JyZpSui0FLw",
        "_score" : 1.0,
        "_source" : {
          "session_id" : "19a7ec8d"
        }
      }
    ]
jaspreet chahal
  • 8,817
  • 2
  • 11
  • 29
  • Thanks. I get the following: ''"reason" : "dynamic method [java.lang.String, size/0] not found"'' with an arrow pointing to the . in session_id.size() – Rowan Smith Apr 19 '20 at 10:05
  • Can you add a sample document and your mapping – jaspreet chahal Apr 19 '20 at 10:07
  • Using your approach this seems to be working... "if(!(ctx._source.session_id instanceof List)) {return;} def firstValue=ctx._source.session_id[0];ctx._source.session_id=firstValue" – Rowan Smith Apr 19 '20 at 10:08
  • Previously I was trying to filter only documents that matched : ctx._source.session_id instanceof List - I think it was returning documents that did not match that as well. – Rowan Smith Apr 19 '20 at 10:09
  • Thanks, how would I do a filter for only the documents with a List instead of doing it on the match_all - I am getting {"statusCode":504,"error":"Gateway Time-out","message":"Client request timeout"} – Rowan Smith Apr 19 '20 at 10:13
  • 1
    There is no way to filter on size in query. As per https://discuss.elastic.co/t/duplicate-value-in-a-fileds/198936/8. "This is expected with a large dataset. No worries, the Update by Query will continue to run in the background until all data has been processed". Can you check after some time if field is updated – jaspreet chahal Apr 19 '20 at 10:20
1

A generic way to make array items unique:

GET index7/_update_by_query
{
  "query": {
    "bool": {
      "filter": {
        "exists": {
          "field": "session_id"
        }
      }
    }
  },
  "script": {
    "inline": """ctx._source.session_id = ctx._source
                                            .session_id
                                            .stream()
                                            .sorted()
                                            .collect(Collectors.toList());
                                            """
  }
}
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Thanks this is helpful. If this results in only one item, does this still leave it as a list with one item, or will it get converted to a value? – Rowan Smith Apr 19 '20 at 10:27
  • 1
    `toList()` will keep it a list. You can then do `[0]` on it if you wanna keep it single-value. To be fair, this deduplication method is rather designed for situations like `[1,2,2,3,3]`... – Joe - GMapsBook.com Apr 19 '20 at 11:51