0

I have around 17 million documents(Its gradually increasing) in the elastic-search index, Mapping of one of the property labels that is used for aggregation is

{
   "mappings":{
      "labels":{
         "properties":{
            "label":{
               "type":"text",
               "fields":{
                  "raw":{
                     "type":"keyword"
                  }
               }
            },
            "count":{
               "type":"float"
            }
         }
      }
   }
}

Each document has more than 500 items in that labels attribute

Now while aggregating the document with query

{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "type": "XYZ"
          }
        }
      ]
    }
  },
  "aggs": {
    "date": {
      "range": {
        "field": "date",
        "ranges": [
          {
            "from": 1577816100000,
            "to": 1609438500000
          },
          {
            "from": 1546280100000,
            "to": 1577816100000
          }
        ]
      },
      "aggs": {
        "field1": {
          "terms": {
            "field": "field1",
            "size": 100
          },
          "aggs": {
            "agg_label": {
              "terms": {
                "field": "labels.label.raw",
                "size": 250,
                "min_doc_count": 5
              },
              "aggs": {
                "sum1": {
                  "sum": {
                    "script": "_score"
                  }
                },
                "sum2": {
                  "sum": {
                    "field": "labels.count"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

It takes around 20 seconds, and the higher the number of values in that field labels higher is the number of execution time.

I know script query is expensive, So is there any way I can significantly minimize the executuion time?

vicky shrestha
  • 147
  • 1
  • 11
  • 1
    What are you trying to achieve exactly? Why are you summing the scores? – Val Mar 17 '21 at 10:33
  • I was going to ask the same @Val! @vicky please provide more detail as to what you're trying to do, what the sample docs look like, how that bias algorithm works etc. Optimizing without knowing the background is futile. – Joe - GMapsBook.com Mar 17 '21 at 22:14
  • @joe What I need is the "sum of relevance scores(elasticsearch query search score) of all the documents related to each aggregated label(terms aggregation)". For e.g. if we have countries as labels: USA, UK, Germany; For a specific query, I want to calculate the summation of search scores(specific to the query) of the documents related to each country like the USA: 755.33, UK: 553.66, etc. – vicky shrestha Mar 18 '21 at 04:29
  • Can you share some docs too? Is `labels` a 1-member array? If not, aren't you experiencing [array flattening](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html#nested-arrays-flattening-objects) and incorrect counts? – Joe - GMapsBook.com Mar 18 '21 at 08:36
  • @JoeSorocin Nope labels are multi-member array, some of the document has more than 500 members with attribute label and count. for example [{ "count" : 2, "label" : "USA" }, { "count" : 1, "label" : "UK" }] – vicky shrestha Mar 22 '21 at 07:05

0 Answers0